I wonder if there are any memory limits to the cache that Streamlit uses to stored cached results when using cache_data. I think I read it’s 1 GB, not too sure.
In my app, I have a function that uses the st.cache_data decorator. It returns a relatively large output, a dataframe that is around 1Mb. I’m afraid that if this function gets called with different parameters many times during a session, many results will be cached and whatever resource used to store the results would be filled. If true, what is the consequence? Will a warning appear? Does the app suffer a performance hit? Will the older results get overwritten?
The answer will help me decide whether to use st.cache_data on said function.
Hi @wangp22, I don’t believe there is an inherent limit with st.cache_data. However, the total amount of memory available to Community Cloud-hosted apps is limited. See Manage your app - Streamlit Docs for more details, but you are correct that you get 1GB of RAM.
You are also correct that, if you call cache_data on a function 100 different times with 100 different parameters, you can expect the total amount of memory used to climb approximately 100x. If you do end up using up all of the memory on your instance, it may well crash and need to be rebooted.
The easiest workaround is to add a limit of how many different values you remember with
max_entries=10, or whatever number is reasonable. See st.cache_data - Streamlit Docs for more details.
You also might try and see whether the cache is really necessary – 1Mb is not a terribly large amount for some purposes, and if it’s pretty quick to generate the dataframe, then caching might not actually make a big difference in performance. It all depends how you are getting/generating that df.
You also might want to consider separating out generating the “base data” and any transformations on it. It’s hard to speak in terribly general terms, but I often end up doing something like this:
# get data from database, or some other source
# This function often doesn't need many parameters
# I may or may not neeed to cache this data, depending on how slow it is
def transform_the_data(filter1, filter2):
base_data = fetch_some_data()
# Do the transformations and return them