Help us stress test Streamlit’s latest caching update

Thanks @nthmost!

I had conflicting packages that wouldn’t let me import the RepeatedScalarContainer but that’s fixed now! Although I now get another error:

UnhashableType: Cannot hash object of type _thread.RLock

FYI the type of the loaded model (which I apparently can’t reach) is returned as:
tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject

I resolved the issue using allow_output_mutation=True.
Thanks for the help!

2 Likes

Hi, here is another one:

Cannot hash object of type CompiledFFI

It happens when trying to create a connection to a Snowflake database. I provide the code below for exhausitivity but I’m not sure it really helps, since the CompiledFFI class is not Snowflake-specific. Thing is I don’t even know where to find this class to implement a custom hash_func… And it is quite annoying cause I do need to cache the results of database queries.

Thanks for your great software and your assistance :slight_smile:

import snowflake.connector

@st.cache
def get_database_connection():
    return snowflake.connector.connect(
       user='XXXX',
       password='XXXX',
       account='XXXX'
    )
2 Likes

Hi @romeodespres,

In your case, it might work to use allow_output_mutation=True in your st.cache declaration. I.e.:

@st.cache(allow_output_mutation=True)
def get_database_connection():
    return snowflake.connector.connect(
       user='XXXX',
       password='XXXX',
       account='XXXX'
    )

The reason is that this will prevent Streamlit from trying to hash this connection object as part of its key.

Let us know if that works!

2 Likes

It does work, thank you! Now that you say it it seems obvious. Shouldn’t the error message suggest your solution? I believe one reason I didn’t think of it is that the message strongly pointed toward hash_func.

While caching some code, Streamlit encountered an object of type CompiledFFI. You’ll
need to help Streamlit understand how to hash that type with the hash_funcs argument.
For example:

@st.cache(hash_funcs={CompiledFFI: my_hash_func})
def my_func(...):
    ...

Please see the hash_funcs documentation for more details.

A short “You can also set allow_output_mutation=True to disable hashing” at the end would have helped me.

2 Likes

UnhashableType : Cannot hash object of type re.Pattern

The function cached is:

def get_config(filename=None, appname=‘your name’):

which returns a ConfigParser object. Which I do want cached!

The hash_funcs noop works
@st.cache(hash_funcs={re.Pattern: lambda _: None})

1 Like

Hi @knorthover,

It sounds like you got your cache function working using hash_funcs. Just wanted to comment for the sake of the thread that yours is also a situation that could be fixed by use of allow_output_mutation=True.

Thanks for chiming in!

1 Like

It seems that I cannot use super() in a class declaration inside a cached function.
I am trying to use an object that requires import taking a long time, I therefore want to place the imports and the class declaration inside the cached function, however, as soon as I add super to subclass, I get the following error

UserHashError : ’ class

I made the following code to highlight the issue:

import streamlit as st


@st.cache()
def state():
    class Parent:
        def test(self):
            return "parent"

    class child(Parent):
        def test(self):
            par = super().test()
            return "hello"

    test = child()
    return test.test()

st.text(state())

Resulting in the error:

UserHashError : ’ class

Error in C:\Users\xxxxx\Devel\RICS\rics-gui-web\st_test_class.py near line 11 :


If you think this is actually a Streamlit bug, please file a bug report here.

Traceback:

  File "C:\Users\xxxxx\st_test_class.py", line 19, in <module>
    st.text(state())

If we remove the super() line, everything runs as expected.
Is this a bug or am I missing something?

Hey @hcoohb - this looks like a bug! Are you able to move your class declaration out of the cached function, or does it rely on values from within that scope?

In the meantime, I’ve filed a bug, because this shouldn’t be happening (or at the very least, we should have a better error message)!

@tim, thanks for creating the bug report!
For now I can move the class declaration outside the cache but it would much neater to move that back inside my cached function, so I will monitor the bug tracker :wink:

Quick update that we’re tracking the mentioned “Cannot hash object of type _______” issues in the following GitHub issues:

Thanks all for helping to track these down :heart:

1 Like

Hi ,

I am using dask for handling large data in the backend and showing a handful of data on the UI.

As we know, if any state of a widget gets changed, Streamlit loads the UI from start.

Dask uses async taks to send and receive large amount of data to any library calls.

I need to hash the dask dataframe, but gives out an error “Cannot hash object of type _asyncio.Task”
and asks me to create a hash function for handling type of “_asyncio.Task”


import streamlit as st
import dask.dataframe as dd

@st.cache()
def get_head(dataframe):
    head = dataframe.head()
    return head

data = dd.read_csv("abcd.csv")
head = get_head(data) ## Causes Error saying "Cannot hash object of type _asyncio.Task"


Gives out below error.


UnhashableType: Cannot hash object of type _asyncio.Task

While caching some code, Streamlit encountered an object of type _asyncio.Task. 
You’ll need to help Streamlit understand how to hash that type with the hash_funcs argument. For example:


@st.cache(hash_funcs={_asyncio.Task: my_hash_func})
def my_func(...):
    ...

Error only comes when i try to put get_head() function in a library code [ python package]
If i use the function from the same file, it runs without giving any error.

In general i need to have a hash function for type of _asyncio.Task.

Any help would be appreciated.

Thanks

2 Likes

Hey @pavansanghavi and welcome to the community :wave:,

Thanks for reporting this, we’re now tracking it as Github issue 1253. Will update the thread when we have more info on it, but feel free to comment on or track the GitHub issue if you’d like as well!

1 Like

Hey all :wave:,

0.57.0 was released yesterday evening which now gives more detailed st.cache error messages to help with debugging. Also, as of 0.57.0, Streamlit now natively supports types re.pattern @knorthover and bytesIO/stringIO :partying_face:.

Going forward, if anyone comes across a “Cannot hash object of type _____” error message and needs help, please provide the full error message available on 0.57.0. Feel free to let us know if you have any questions and we’ll message the thread when we have more updates!

Hi @pavansanghavi, could you explain what you mean by this? I’m trying to reproduce but having issues.

I’m trying to cache the results for the following function:

@st.cache()
def load_lunch_tasks(rider_ids,df_tasks):
    all_lunch_tasks = np.array([np.mean(ins.get_lunch_tasks(rider_id, df_tasks)) for rider_id in rider_ids])
    return all_lunch_tasks

but I get the following error:

KeyError : ‘workday’

Streamlit encountered an error while caching the body of load_lunch_tasks() . This is likely due to a bug in codebase/insights.py near line 127 :

  if arrived.day == workday and dt.time(10,30) <= arrived.time() <= dt.time(12,30)] )  # and completed.time()
               for workday in days_worked]
lunch_tasks = list(filter(lambda ts: ts != 0, lunch_tasks))

Here is the full function below that seems to be the problem. Do you have any idea what the issue might be?

def get_lunch_tasks(rider_id, df=None):
    rider_jobs = np.unique(df.query("FleetId==@rider_id")['bookingId'].values)
    jobs_start_end = pd.DataFrame([get_job_start_end(job_id, df) for job_id in rider_jobs if get_job_start_end(job_id, df) is not None])
    days_worked = np.unique(jobs_start_end.start.dt.day)
    lunch_tasks = [len([arrived for arrived, completed in zip(jobs_start_end.start,jobs_start_end.finish)
      if arrived.day == workday and dt.time(10,30) <= arrived.time() <= dt.time(12,30)] )  # and completed.time()
                   for workday in days_worked]
    lunch_tasks = list(filter(lambda ts: ts != 0, lunch_tasks))
    return lunch_tasks

Hi - Posting this here in case it helps anyone looking to resolve this issue:
I ran into an number of hashing problems writing a class that included the use of Keras. Here is how I worked around it, including all of the objects that caused hashing errors:

    hash_funcs={'_thread.RLock' : lambda _: None, 
                '_thread.lock' : lambda _: None, 
                'builtins.PyCapsule': lambda _: None, 
                '_io.TextIOWrapper' : lambda _: None, 
                'builtins.weakref': lambda _: None,
                'builtins.dict' : lambda _:None}

and then before every cached function:

@st.cache(hash_funcs=hash_funcs)
2 Likes

Hey all :wave:,

A few quick updates.

As of 0.58.0, type tf.Session is now natively supported in Streamlit.

As of 0.59.0 the following are now natively supported in Streamlit:

As of 0.60.0 the following are now natively supported in Streamlit:

As of 0.61.0 the following are now natively supported in Streamlit:

We’ll update the thread when we have a few more on nightly or in a general release :hearts:

5 Likes

Similar bug to Issue #1181 mentioned above in comment #11.

Decorating a method that calls super() raises streamlit.hashing.InternalHashError: Cell is Empty

Specifically:

    
    # from a file that I'm hesitant to import streamlit into because it's a shared dependency reused elsewhere
    class Dataset:

        def load_master_dataset(self, csv_path):
            self.master_df = pd.read_csv(csv_path)
            self.master_df.rename(columns = {v:k for k,v in self.label_map.items()},inplace=True)
            self.master_df.drop_duplicates(subset = ['catalog_number'], keep='first', inplace=True)
            self.master_df.set_index('catalog_number', inplace = True)
        
        ...

    # in another file
    class CacheDataset(Dataset):

        @st.cache
        def load_master_dataset(self, csv_path):
              super().load_master_dataset(csv_path)
    
        ...

Raises:

streamlit.hashing.InternalHashError: Cell is empty

While caching the body of load_master_dataset(), Streamlit encountered an
object of type builtins.function, which it does not know how to hash.

In this specific case, it’s very likely you found a Streamlit bug so please
[file a bug report here.]
(Sign in to GitHub · GitHub)

In the meantime, you can try bypassing this error by registering a custom
hash function via the hash_funcs keyword in @st.cache(). For example:

@st.cache(hash_funcs={builtins.function: my_hash_func})
def my_func(...):
    ...

If you don’t know where the object of type builtins.function is coming
from, try looking at the hash chain below for an object that you do recognize,
then pass that to hash_funcs instead:

Object of type builtins.function: <function CacheDataset.load_master_dataset at 0x12351fca0>

Please see the hash_funcs [documentation]
(https://streamlit.io/docs/caching.html)
for more details.

I’m pretty sure the built-in function in question is super().

  1. I tried decorating the base class’s load_master_dataset with @st.cache and then directly importing that (so cutting out CacheDataset), which works just fine. I can make this change for my use case, but it isn’t super elegant.
  2. super’s mro includes <class 'object'>

So am I misusing the @st.cache, or should I figure out how to hash super()