Cacheing output of expensive function calls

fonnesbeck · November 29, 2019, 6:14pm

Would it be accurate to claim that streamlit.cache is designed more for cacheing data import activity rather than the outputs of computation? I’m asking because, based on the limited time I have spent with it, it has problems with hashing objects used inside of function calls. For example, I am using rpy2 to call an R script inside of a function, but it cannot deal with one or more of the objects it uses:

Streamlit cannot hash an object of type <class 'rpy2.robjects.conversion.Converter'>.

Is there a way of making cacheing work with arbitrary function calls? I was hoping that the output (in this case, a numpy array) was all that was being cached, but it looks like the serialization goes deeper than that.

nthmost · December 2, 2019, 6:51pm

Hello! Thanks for the question.

Many types of objects can be cached as long as python can find a way to serialize them. It may be that the objects returned by the function you’re having trouble with are not exposing any serialization methods to Python.

Perhaps you can write a little more of a wrapper function around the call to the R script such that the inputs to the function as well as the data returned from the object are converted to something more streamlit-native, like a Dataframe or a python dictionary? You should be able to @st.cache such a function without issue.

If you’re already doing that, let us know – I’m totally just making guesses here without seeing your code.

nthmost · December 2, 2019, 9:05pm

Also, check out this very similar thread about cacheing with different types of objects.

st.cache is always improving – it’s actually one of our highest priorities right now. You can try out the latest streamlit off the develop branch if you’re eager to find out if recent updates might solve your particular issue.

fonnesbeck · December 2, 2019, 9:32pm

Thanks for the response; this is all very helpful.

andrewPoulton · December 16, 2019, 11:27am

Hi there - I just wanted to piggyback on this as I had a very similar question and hacked a workaround, but suspect it could be handled better.

In my use case, I want to visualize gradients and attention scores from a large PyTorch model (a BERT model). I have sliders to select attention scores from particular layers and heads in the model, but the gradient calculation is very expensive, taking around 3s on CPU, and I obviously don’t want to recompute them whenever I want to change the layer/head I want to visualize.

Refactoring my code to work with the @st.cache decorator would take ages (this is a proof of concept and the function that does the calculation takes unhashable arguments) , so my workaround was to directly check and add to st.caching._mem_cache within this function. As the model is an NLP model, I can use text as cache keys.

Perhaps if there is no convenient replacement for unhashable types (say as in @fonnesbeck’s case), you could use the id of the object as a cache key instead? This should be fine if it’s in the global scope. Explicitly, I mean something like

def expensive_function(unhashable_type):
    obj_id = id(unhashable_type)
    if obj_id in streamlit.caching._mem_cache:
        return streamlit.caching._mem_cache[obj_id]
    else:
        return_value = #whatever you want to do with unhashable_type
        streamlit.caching._mem_cache[obj_id] = return_value
        return return_value

Presumably this has side-effects I’m not aware of, but interacting with this the cache dictionary explicitly could be useful in general - are there plans to support this type of thing in the future?

nthmost · December 16, 2019, 9:28pm

Hi @andrewPoulton,

I’ve got this big grin on my face as I read your code. Clever…!

That _mem_cache object is basically just a dictionary into which we store a hash of a lot of different things, not just the input values but also the cached function code itself. We do that in order to detect changes in your script that might invalidate the return value.

I’m thinking it’s very unlikely that you’re going to run into ill effects doing what you’re doing as of right now (streamlit 0.51.0), but no one could promise you that your solution wouldn’t break in future versions. For example, we’re not currently doing any garbage-collection right now, but… we might!

We’re currently working on improvements to st.cache that will help you not need to do this kind of thing to get what you need.

The use cases described in this thread are definitely primary use cases for Streamlit and thus important to fix, so please stay tuned!

andrewPoulton · December 17, 2019, 7:57am

Thanks @nthmost! From a user perspective, I’d argue there’s two different types of caches desired - the kind given by @st.cache currently to cache large data objects (or model weights or whatever), and something closer to an LRU cache for caching outputs of computations.

Jonathan_Rhone · December 17, 2019, 5:18pm

Hi @andrewPoulton, @fonnesbeck,

You can set your own hash function for different types of objects.

By passing the hash_funcs param to your @st.cache decorator, for ex…

@st.cache(hash_funcs={rpy2.robjects.conversion.Converter: id})

This feature is already included in v0.51.0, but we’re still updating the documentation.

Here’s a link to the docstring that mentions the new hash_funcs param and provides example usage.

https://github.com/streamlit/streamlit/blob/develop/lib/streamlit/caching.py#L437

tc1 · February 24, 2020, 11:19pm

Hey @fonnesbeck and @andrewPoulton ,

Might have already seen the updated docs, but if not [or if anyone else comes across this thread] wanted to give you all a quick update that the documentation @Jonathan_Rhone was mentioning was released this month. Here are some helpful links:

If you come across any issues or would like more context, here is a helpful topic.

Topic		Replies	Views
Alternative to hash_funcs in caching Using Streamlit cache	8	1461	June 27, 2023
Using caching with API calls and messy DataFrames Using Streamlit cache , pandas	5	1242	November 19, 2021
Help us stress test Streamlit’s latest caching update Official Announcements cache	23	9417	February 7, 2022
Caching results of class methods Using Streamlit cache	2	3473	November 19, 2021
@st.cache_data VS @st.cache_resource - small issues Using Streamlit	12	5383	February 17, 2024

Cacheing output of expensive function calls

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies