Cacheing output of expensive function calls

Would it be accurate to claim that streamlit.cache is designed more for cacheing data import activity rather than the outputs of computation? I’m asking because, based on the limited time I have spent with it, it has problems with hashing objects used inside of function calls. For example, I am using rpy2 to call an R script inside of a function, but it cannot deal with one or more of the objects it uses:

Streamlit cannot hash an object of type <class 'rpy2.robjects.conversion.Converter'>.

Is there a way of making cacheing work with arbitrary function calls? I was hoping that the output (in this case, a numpy array) was all that was being cached, but it looks like the serialization goes deeper than that.

1 Like

Hello! Thanks for the question.

Many types of objects can be cached as long as python can find a way to serialize them. It may be that the objects returned by the function you’re having trouble with are not exposing any serialization methods to Python.

Perhaps you can write a little more of a wrapper function around the call to the R script such that the inputs to the function as well as the data returned from the object are converted to something more streamlit-native, like a Dataframe or a python dictionary? You should be able to @st.cache such a function without issue.

If you’re already doing that, let us know – I’m totally just making guesses here without seeing your code.

Also, check out this very similar thread about cacheing with different types of objects.

st.cache is always improving – it’s actually one of our highest priorities right now. You can try out the latest streamlit off the develop branch if you’re eager to find out if recent updates might solve your particular issue.

1 Like

Thanks for the response; this is all very helpful.

2 Likes

Hi there - I just wanted to piggyback on this as I had a very similar question and hacked a workaround, but suspect it could be handled better.

In my use case, I want to visualize gradients and attention scores from a large PyTorch model (a BERT model). I have sliders to select attention scores from particular layers and heads in the model, but the gradient calculation is very expensive, taking around 3s on CPU, and I obviously don’t want to recompute them whenever I want to change the layer/head I want to visualize.

Refactoring my code to work with the @st.cache decorator would take ages (this is a proof of concept and the function that does the calculation takes unhashable arguments) , so my workaround was to directly check and add to st.caching._mem_cache within this function. As the model is an NLP model, I can use text as cache keys.

Perhaps if there is no convenient replacement for unhashable types (say as in @fonnesbeck’s case), you could use the id of the object as a cache key instead? This should be fine if it’s in the global scope. Explicitly, I mean something like

def expensive_function(unhashable_type):
    obj_id = id(unhashable_type)
    if obj_id in streamlit.caching._mem_cache:
        return streamlit.caching._mem_cache[obj_id]
    else:
        return_value = #whatever you want to do with unhashable_type
        streamlit.caching._mem_cache[obj_id] = return_value
        return return_value

Presumably this has side-effects I’m not aware of, but interacting with this the cache dictionary explicitly could be useful in general - are there plans to support this type of thing in the future?

1 Like

Hi @andrewPoulton,

I’ve got this big grin on my face as I read your code. :smiley: Clever…!

That _mem_cache object is basically just a dictionary into which we store a hash of a lot of different things, not just the input values but also the cached function code itself. We do that in order to detect changes in your script that might invalidate the return value.

I’m thinking it’s very unlikely that you’re going to run into ill effects doing what you’re doing as of right now (streamlit 0.51.0), but no one could promise you that your solution wouldn’t break in future versions. For example, we’re not currently doing any garbage-collection right now, but… we might!

We’re currently working on improvements to st.cache that will help you not need to do this kind of thing to get what you need.

The use cases described in this thread are definitely primary use cases for Streamlit and thus important to fix, so please stay tuned!

Thanks @nthmost! From a user perspective, I’d argue there’s two different types of caches desired - the kind given by @st.cache currently to cache large data objects (or model weights or whatever), and something closer to an LRU cache for caching outputs of computations.

Hi @andrewPoulton, @fonnesbeck,

You can set your own hash function for different types of objects.

By passing the hash_funcs param to your @st.cache decorator, for ex…

@st.cache(hash_funcs={rpy2.robjects.conversion.Converter: id})

This feature is already included in v0.51.0, but we’re still updating the documentation.

Here’s a link to the docstring that mentions the new hash_funcs param and provides example usage.

https://github.com/streamlit/streamlit/blob/develop/lib/streamlit/caching.py#L437

2 Likes

Hey @fonnesbeck and @andrewPoulton :wave:,

Might have already seen the updated docs, but if not [or if anyone else comes across this thread] wanted to give you all a quick update that the documentation @Jonathan_Rhone was mentioning was released this month. Here are some helpful links:

If you come across any issues or would like more context, here is a helpful topic.

1 Like