Would it be accurate to claim that streamlit.cache is designed more for cacheing data import activity rather than the outputs of computation? I’m asking because, based on the limited time I have spent with it, it has problems with hashing objects used inside of function calls. For example, I am using rpy2 to call an R script inside of a function, but it cannot deal with one or more of the objects it uses:
Streamlit cannot hash an object of type <class 'rpy2.robjects.conversion.Converter'>.
Is there a way of making cacheing work with arbitrary function calls? I was hoping that the output (in this case, a numpy array) was all that was being cached, but it looks like the serialization goes deeper than that.
Many types of objects can be cached as long as python can find a way to serialize them. It may be that the objects returned by the function you’re having trouble with are not exposing any serialization methods to Python.
Perhaps you can write a little more of a wrapper function around the call to the R script such that the inputs to the function as well as the data returned from the object are converted to something more streamlit-native, like a Dataframe or a python dictionary? You should be able to @st.cache such a function without issue.
If you’re already doing that, let us know – I’m totally just making guesses here without seeing your code.
st.cache is always improving – it’s actually one of our highest priorities right now. You can try out the latest streamlit off the develop branch if you’re eager to find out if recent updates might solve your particular issue.
Hi there - I just wanted to piggyback on this as I had a very similar question and hacked a workaround, but suspect it could be handled better.
In my use case, I want to visualize gradients and attention scores from a large PyTorch model (a BERT model). I have sliders to select attention scores from particular layers and heads in the model, but the gradient calculation is very expensive, taking around 3s on CPU, and I obviously don’t want to recompute them whenever I want to change the layer/head I want to visualize.
Refactoring my code to work with the @st.cache decorator would take ages (this is a proof of concept and the function that does the calculation takes unhashable arguments) , so my workaround was to directly check and add to st.caching._mem_cache within this function. As the model is an NLP model, I can use text as cache keys.
Perhaps if there is no convenient replacement for unhashable types (say as in @fonnesbeck’s case), you could use the id of the object as a cache key instead? This should be fine if it’s in the global scope. Explicitly, I mean something like
def expensive_function(unhashable_type):
obj_id = id(unhashable_type)
if obj_id in streamlit.caching._mem_cache:
return streamlit.caching._mem_cache[obj_id]
else:
return_value = #whatever you want to do with unhashable_type
streamlit.caching._mem_cache[obj_id] = return_value
return return_value
Presumably this has side-effects I’m not aware of, but interacting with this the cache dictionary explicitly could be useful in general - are there plans to support this type of thing in the future?
I’ve got this big grin on my face as I read your code. Clever…!
That _mem_cache object is basically just a dictionary into which we store a hash of a lot of different things, not just the input values but also the cached function code itself. We do that in order to detect changes in your script that might invalidate the return value.
I’m thinking it’s very unlikely that you’re going to run into ill effects doing what you’re doing as of right now (streamlit 0.51.0), but no one could promise you that your solution wouldn’t break in future versions. For example, we’re not currently doing any garbage-collection right now, but… we might!
We’re currently working on improvements to st.cache that will help you not need to do this kind of thing to get what you need.
The use cases described in this thread are definitely primary use cases for Streamlit and thus important to fix, so please stay tuned!
Thanks @nthmost! From a user perspective, I’d argue there’s two different types of caches desired - the kind given by @st.cache currently to cache large data objects (or model weights or whatever), and something closer to an LRU cache for caching outputs of computations.
Might have already seen the updated docs, but if not [or if anyone else comes across this thread] wanted to give you all a quick update that the documentation @Jonathan_Rhone was mentioning was released this month. Here are some helpful links:
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.