Help us stress test Streamlit’s latest caching update

tc1 · February 13, 2020, 11:23pm

Hey Community ,

When building Streamlit apps, it’s always a good idea to wrap inside an @st.cache all expensive computations and slow data fetches. But as well as st.cache works, in many cases we also recognize that it fails when encountering certain objects like Tensorflow sessions, SpaCy objects, Lock objects, and so on.

So over the past months we started slowly releasing several improvements to how st.cache works. These improvements fall into 3 categories:

Improvements to the caching logic. For example, we now support caching custom classes out of the box, we have better support for tuples, etc.
Improvements to error messages and accompanying documentation.
A new keyword argument called hash_funcs which allows you to customize the behavior of st.cache for your specific use case. In particular, if you ever encountered an object that st.cache couldn’t handle, hash_funcs now allows you to fix that yourself!

You can find out more about all these changes in our docs:

We’re super excited to release all these changes, but also realize they’re all still very new, and full of rough edges! So we would love some help tracking down issues so we can solve them ASAP.

If you encounter any problems with the latest st.cache updates, please post to this thread. Specifically whenever you see the warning “Cannot hash object of type _______” let us know the name of that object, and provide a short code snippet if possible.

Thank you for your help in making Streamlit better, and we also welcome any other feedback or ideas you have on caching!

Snertie · February 14, 2020, 9:26am

I’ll start!

I’m having an issue with caching a loaded tensorflow hub model. I get an UnhashableType error on the type ‘google.protobuf.pyext._message.RepeatedScalarContainer’. The error suggest using hash_funcs but I can’t access that type so it doesn’t work. I tried wrapping the model in a custom object and forcing the ‘id’ as hashing function, but this doesn’t work either.

I don’t really know if I’m forgetting something obvious or not. Code sample below:

@st.cache
def get_model():
    return hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")

And the error I get is:

UnhashableType: Cannot hash object of type google.protobuf.pyext._message.RepeatedScalarContainer

Thanks in advance and keep up the good work!

nthmost · February 14, 2020, 9:29pm

Hi @Snertie – Can you tell me what happens when you do something like this:

from google.protobuf.pyext._message import RepeatedScalarContainer 

[...your code...]  

@st.cache(hash_funcs={RepeatedScalarContainer: id})
def get_model():
    return hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")

?

Snertie · February 17, 2020, 8:15am

Thanks @nthmost!

I had conflicting packages that wouldn’t let me import the RepeatedScalarContainer but that’s fixed now! Although I now get another error:

UnhashableType: Cannot hash object of type _thread.RLock

FYI the type of the loaded model (which I apparently can’t reach) is returned as:
tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject

I resolved the issue using allow_output_mutation=True.
Thanks for the help!

romeodespres · February 18, 2020, 5:28pm

Hi, here is another one:

Cannot hash object of type CompiledFFI

It happens when trying to create a connection to a Snowflake database. I provide the code below for exhausitivity but I’m not sure it really helps, since the CompiledFFI class is not Snowflake-specific. Thing is I don’t even know where to find this class to implement a custom hash_func… And it is quite annoying cause I do need to cache the results of database queries.

Thanks for your great software and your assistance

import snowflake.connector

@st.cache
def get_database_connection():
    return snowflake.connector.connect(
       user='XXXX',
       password='XXXX',
       account='XXXX'
    )

nthmost · February 19, 2020, 10:12pm

Hi @romeodespres,

In your case, it might work to use allow_output_mutation=True in your st.cache declaration. I.e.:

@st.cache(allow_output_mutation=True)
def get_database_connection():
    return snowflake.connector.connect(
       user='XXXX',
       password='XXXX',
       account='XXXX'
    )

The reason is that this will prevent Streamlit from trying to hash this connection object as part of its key.

Let us know if that works!

romeodespres · February 20, 2020, 9:24am

It does work, thank you! Now that you say it it seems obvious. Shouldn’t the error message suggest your solution? I believe one reason I didn’t think of it is that the message strongly pointed toward hash_func.

While caching some code, Streamlit encountered an object of type CompiledFFI. You’ll
need to help Streamlit understand how to hash that type with the hash_funcs argument.
For example:

@st.cache(hash_funcs={CompiledFFI: my_hash_func})
def my_func(...):
    ...

Please see the hash_funcs documentation for more details.

A short “You can also set allow_output_mutation=True to disable hashing” at the end would have helped me.

knorthover · February 20, 2020, 9:13pm

UnhashableType : Cannot hash object of type re.Pattern

The function cached is:

def get_config(filename=None, appname=‘your name’):

which returns a ConfigParser object. Which I do want cached!

The hash_funcs noop works
@st.cache(hash_funcs={re.Pattern: lambda _: None})

nthmost · February 24, 2020, 8:44pm

Hi @knorthover,

It sounds like you got your cache function working using hash_funcs. Just wanted to comment for the sake of the thread that yours is also a situation that could be fixed by use of allow_output_mutation=True.

Thanks for chiming in!

hcoohb · March 3, 2020, 1:29am

It seems that I cannot use super() in a class declaration inside a cached function.
I am trying to use an object that requires import taking a long time, I therefore want to place the imports and the class declaration inside the cached function, however, as soon as I add super to subclass, I get the following error

UserHashError : ’ class ’

I made the following code to highlight the issue:

import streamlit as st


@st.cache()
def state():
    class Parent:
        def test(self):
            return "parent"

    class child(Parent):
        def test(self):
            par = super().test()
            return "hello"

    test = child()
    return test.test()

st.text(state())

Resulting in the error:

UserHashError : ’ class ’

Error in C:\Users\xxxxx\Devel\RICS\rics-gui-web\st_test_class.py near line 11 :
If you think this is actually a Streamlit bug, please file a bug report here.

Traceback:
  File "C:\Users\xxxxx\st_test_class.py", line 19, in <module>
    st.text(state())

If we remove the super() line, everything runs as expected.
Is this a bug or am I missing something?

tim · March 4, 2020, 9:16pm

Hey @hcoohb - this looks like a bug! Are you able to move your class declaration out of the cached function, or does it rely on values from within that scope?

In the meantime, I’ve filed a bug, because this shouldn’t be happening (or at the very least, we should have a better error message)!

hcoohb · March 5, 2020, 1:31pm

@tim, thanks for creating the bug report!
For now I can move the class declaration outside the cache but it would much neater to move that back inside my cached function, so I will monitor the bug tracker

tc1 · March 6, 2020, 11:42pm

Quick update that we’re tracking the mentioned “Cannot hash object of type _______” issues in the following GitHub issues:

Thanks all for helping to track these down

pavansanghavi · March 23, 2020, 6:49pm

Hi ,

I am using dask for handling large data in the backend and showing a handful of data on the UI.

As we know, if any state of a widget gets changed, Streamlit loads the UI from start.

Dask uses async taks to send and receive large amount of data to any library calls.

I need to hash the dask dataframe, but gives out an error “Cannot hash object of type _asyncio.Task”
and asks me to create a hash function for handling type of “_asyncio.Task”


import streamlit as st
import dask.dataframe as dd

@st.cache()
def get_head(dataframe):
    head = dataframe.head()
    return head

data = dd.read_csv("abcd.csv")
head = get_head(data) ## Causes Error saying "Cannot hash object of type _asyncio.Task"

Gives out below error.


UnhashableType: Cannot hash object of type _asyncio.Task

While caching some code, Streamlit encountered an object of type _asyncio.Task. 
You’ll need to help Streamlit understand how to hash that type with the hash_funcs argument. For example:


@st.cache(hash_funcs={_asyncio.Task: my_hash_func})
def my_func(...):
    ...

Error only comes when i try to put get_head() function in a library code [ python package]
If i use the function from the same file, it runs without giving any error.

In general i need to have a hash function for type of _asyncio.Task.

Any help would be appreciated.

Thanks

tc1 · March 25, 2020, 5:19am

Hey @pavansanghavi and welcome to the community ,

Thanks for reporting this, we’re now tracking it as Github issue 1253. Will update the thread when we have more info on it, but feel free to comment on or track the GitHub issue if you’d like as well!

tc1 · March 27, 2020, 8:24pm

Hey all ,

0.57.0 was released yesterday evening which now gives more detailed st.cache error messages to help with debugging. Also, as of 0.57.0, Streamlit now natively supports types re.pattern @knorthover and bytesIO/stringIO .

Going forward, if anyone comes across a “Cannot hash object of type _____” error message and needs help, please provide the full error message available on 0.57.0. Feel free to let us know if you have any questions and we’ll message the thread when we have more updates!

Jonathan_Rhone · April 27, 2020, 5:52pm

Hi @pavansanghavi, could you explain what you mean by this? I’m trying to reproduce but having issues.

nico_coll · April 29, 2020, 9:43am

I’m trying to cache the results for the following function:

@st.cache()
def load_lunch_tasks(rider_ids,df_tasks):
    all_lunch_tasks = np.array([np.mean(ins.get_lunch_tasks(rider_id, df_tasks)) for rider_id in rider_ids])
    return all_lunch_tasks

but I get the following error:

KeyError : ‘workday’

Streamlit encountered an error while caching the body of load_lunch_tasks() . This is likely due to a bug in codebase/insights.py near line 127 :

  if arrived.day == workday and dt.time(10,30) <= arrived.time() <= dt.time(12,30)] )  # and completed.time()
               for workday in days_worked]
lunch_tasks = list(filter(lambda ts: ts != 0, lunch_tasks))

Here is the full function below that seems to be the problem. Do you have any idea what the issue might be?

def get_lunch_tasks(rider_id, df=None):
    rider_jobs = np.unique(df.query("FleetId==@rider_id")['bookingId'].values)
    jobs_start_end = pd.DataFrame([get_job_start_end(job_id, df) for job_id in rider_jobs if get_job_start_end(job_id, df) is not None])
    days_worked = np.unique(jobs_start_end.start.dt.day)
    lunch_tasks = [len([arrived for arrived, completed in zip(jobs_start_end.start,jobs_start_end.finish)
      if arrived.day == workday and dt.time(10,30) <= arrived.time() <= dt.time(12,30)] )  # and completed.time()
                   for workday in days_worked]
    lunch_tasks = list(filter(lambda ts: ts != 0, lunch_tasks))
    return lunch_tasks

Topic		Replies	Views
St.cache with sqlalchemy Using Streamlit cache	4	5364	November 19, 2021
Spacy span class object not hashable Using Streamlit cache , nlp	2	1220	January 12, 2022
Alternative to hash_funcs in caching Using Streamlit cache	8	1481	June 27, 2023
Streamlit with sqlalchemy Using Streamlit cache , pandas	3	3021	July 26, 2023
Caching with hash_funcs fails for similar methods Using Streamlit cache	6	2043	January 12, 2022

Help us stress test Streamlit’s latest caching update

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies