Help us stress test Streamlit’s latest caching update

Hey Community :wave:,

When building Streamlit apps, it’s always a good idea to wrap inside an @st.cache all expensive computations and slow data fetches. But as well as st.cache works, in many cases we also recognize that it fails when encountering certain objects like Tensorflow sessions, SpaCy objects, Lock objects, and so on.

So over the past months we started slowly releasing several improvements to how st.cache works. These improvements fall into 3 categories:

  1. Improvements to the caching logic. For example, we now support caching custom classes out of the box, we have better support for tuples, etc.

  2. Improvements to error messages and accompanying documentation.

  3. A new keyword argument called hash_funcs which allows you to customize the behavior of st.cache for your specific use case. In particular, if you ever encountered an object that st.cache couldn’t handle, hash_funcs now allows you to fix that yourself!

You can find out more about all these changes in our docs:

We’re super excited to release all these changes, but also realize they’re all still very new, and full of rough edges! So we would love some help tracking down issues so we can solve them ASAP.

If you encounter any problems with the latest st.cache updates, please post to this thread. Specifically whenever you see the warning “Cannot hash object of type _______” let us know the name of that object, and provide a short code snippet if possible.

Thank you for your help in making Streamlit better, and we also welcome any other feedback or ideas you have on caching!

5 Likes

I’ll start!

I’m having an issue with caching a loaded tensorflow hub model. I get an UnhashableType error on the type ‘google.protobuf.pyext._message.RepeatedScalarContainer’. The error suggest using hash_funcs but I can’t access that type so it doesn’t work. I tried wrapping the model in a custom object and forcing the ‘id’ as hashing function, but this doesn’t work either.

I don’t really know if I’m forgetting something obvious or not. Code sample below:

@st.cache
def get_model():
    return hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")

And the error I get is:

UnhashableType: Cannot hash object of type google.protobuf.pyext._message.RepeatedScalarContainer

Thanks in advance and keep up the good work!

1 Like

Hi @Snertie – Can you tell me what happens when you do something like this:

from google.protobuf.pyext._message import RepeatedScalarContainer 

[...your code...]  

@st.cache(hash_funcs={RepeatedScalarContainer: id})
def get_model():
    return hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")

?

1 Like

Thanks @nthmost!

I had conflicting packages that wouldn’t let me import the RepeatedScalarContainer but that’s fixed now! Although I now get another error:

UnhashableType: Cannot hash object of type _thread.RLock

FYI the type of the loaded model (which I apparently can’t reach) is returned as:
tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject

I resolved the issue using allow_output_mutation=True.
Thanks for the help!

2 Likes

Hi, here is another one:

Cannot hash object of type CompiledFFI

It happens when trying to create a connection to a Snowflake database. I provide the code below for exhausitivity but I’m not sure it really helps, since the CompiledFFI class is not Snowflake-specific. Thing is I don’t even know where to find this class to implement a custom hash_func… And it is quite annoying cause I do need to cache the results of database queries.

Thanks for your great software and your assistance :slight_smile:

import snowflake.connector

@st.cache
def get_database_connection():
    return snowflake.connector.connect(
       user='XXXX',
       password='XXXX',
       account='XXXX'
    )
2 Likes

Hi @romeodespres,

In your case, it might work to use allow_output_mutation=True in your st.cache declaration. I.e.:

@st.cache(allow_output_mutation=True)
def get_database_connection():
    return snowflake.connector.connect(
       user='XXXX',
       password='XXXX',
       account='XXXX'
    )

The reason is that this will prevent Streamlit from trying to hash this connection object as part of its key.

Let us know if that works!

2 Likes

It does work, thank you! Now that you say it it seems obvious. Shouldn’t the error message suggest your solution? I believe one reason I didn’t think of it is that the message strongly pointed toward hash_func.

While caching some code, Streamlit encountered an object of type CompiledFFI. You’ll
need to help Streamlit understand how to hash that type with the hash_funcs argument.
For example:

@st.cache(hash_funcs={CompiledFFI: my_hash_func})
def my_func(...):
    ...

Please see the hash_funcs documentation for more details.

A short “You can also set allow_output_mutation=True to disable hashing” at the end would have helped me.

2 Likes

UnhashableType : Cannot hash object of type re.Pattern

The function cached is:

def get_config(filename=None, appname=‘your name’):

which returns a ConfigParser object. Which I do want cached!

The hash_funcs noop works
@st.cache(hash_funcs={re.Pattern: lambda _: None})

1 Like

Hi @knorthover,

It sounds like you got your cache function working using hash_funcs. Just wanted to comment for the sake of the thread that yours is also a situation that could be fixed by use of allow_output_mutation=True.

Thanks for chiming in!