Hello community,
This is my first topic in the forum! Let me thank you in advance for your responses to other questions that made my path deploying apps in Streamlit much easier.
Let me also thank you for the amazing work that this platform has made, making it possible for all us who focus on another kind of development that is not web related, to have a place where to publish our work.
I’m actually in a data science bootcamp, and we had a challenge related with machine learning models. I had to upload these models to github using joblib for a better compression that reduces its memory usage, but still very big files.
I wanted to make the app faster in the prediction, so I set up st@cache with all those files, some csvs and models, but still going over resource limits.
Here’s where I am now:
@st.cache(allow_output_mutation=True, ttl= 24*3600)
def load_ci_madrid_barcelona():
return city_instance_mb.load_model_joblib(os.path.join(abs_path, "..", "resources", "models", "model_madrid_barcelona.gz"))
@st.cache(allow_output_mutation=True, ttl= 24*3600)
def load_ci_london():
return city_instance_london.load_model_joblib(os.path.join(abs_path, "..", "resources", "models", "model_london.gz"))
@st.cache(allow_output_mutation=True, ttl= 24*3600)
def load_madrid_csv():
return os.path.join(abs_path, "..", "resources", "datasets", "madrid.csv")
@st.cache(allow_output_mutation=True, ttl= 24*3600)
def load_barcelona_csv():
return os.path.join(abs_path, "..", "resources", "datasets", "barcelona.csv")
@st.cache(allow_output_mutation=True, ttl= 24*3600)
def load_london_csv():
return os.path.join(abs_path, "..", "resources", "datasets", "london.csv")
@st.cache(allow_output_mutation=True, ttl= 24*3600)
def create_instance_mb():
d_csvs, d_names = dict(), dict()
d_csvs["csvs1"] = [madrid, barcelona]
d_names["names1"] = ["madrid","barcelona"]
return ac.airbnb(d_csvs["csvs1"], d_names["names1"], "csv")
@st.cache(allow_output_mutation=True, ttl= 60)
def create_instance_london():
d_csvs, d_names = dict(), dict()
d_csvs["csvs2"] = [london]
d_names["names2"] = ["london"]
return ac.airbnb(d_csvs["csvs2"], d_names["names2"], "csv")
madrid = load_madrid_csv()
barcelona = load_barcelona_csv()
london = load_london_csv()
city_instance_mb = create_instance_mb()
city_instance_london = create_instance_london()
model_madrid_barcelona = load_ci_madrid_barcelona()
model_london = load_ci_london()
I set up different ttls, such as 60 or 600, but either seemed to work properly. I’m wondering if maybe I’m not understanding right how st@cache works.
All the information about the project is public here, if you need to check the entire code or something.
I couldn’t find another topic where this is solved and I really hope to be respecting every rule in the forum.
Thanks in advance for reading this topic.