Streamlit cannot hash an object of type <class 'sqlalchemy.engine.base.Engine'>

I would like to cash the output of a function reading data from a database and returning a panda dataframe. I keep getting the message Streamlit cannot hash an object of type <class ‘sqlalchemy.engine.base.Engine’>. I tried the suggestion shown in the error message and added the argument allow_output_mutation=True to the st.cache decorator but it did not help. Can this be done or is it just not possible? I am on Windows 10 and Streamlit version 0.52.0. I am currently adapting my code to read most data from a database instead of csv files so I can better prefilter the data. However, if cashing is not possible with databases I would probably try another route.

@st.cache(allow_output_mutation=True)
def get_data_Collection_id(criteria):
    global sql_engine

    query = "Select * from v_data_collections"
    if criteria > '':
        query += ' where ' + criteria
    result = pd.read_sql_query(query, sql_engine)
    return result

Hi @godot63,

Sounds like the newly released hash_funcs would solve your problem.

https://streamlit.io/docs/api.html#streamlit.cache

hash_funcs ( dict or None ) – Mapping of types to hash functions. This is used to override the behavior of the hasher inside Streamlit’s caching mechanism: when the hasher encounters an object, it will first check to see if its type matches a key in this dict and, if so, will use the provided function to generate a hash for it. See below for an example of how this can be used.

@st.cache(hash_funcs={sqlalchemy.engine.base.Engine: id})
def get_data_Collection_id(criteria):

We’re actively working on new documentation around Caching to explain best practices around this feature.

thanks for your fast reply. I tried your suggestion but was not successful so far. I think I have not understood very well the concept around cashing and will have to wrap my head around it once the doc is updated. Looking forward to that! In the meantime I try to program data access more in a on-demand-style: instead of fetching all data when starting the app and then cashing it, I only fetch the data I need for the current menu item. Works fine so far. I’ll worry about fine tuning with cashing when the solution is in a more definitive state.

Hey @godot63 :wave:,

Good chance you’ve already seen the updated docs, but we wanted to update some of the older threads that documentation around advance caching and hash_funcs was released this month. Here are some helpful links:

If you come across any issues or would like more context, here is a helpful topic.

hi @tc1,
thanks for the heads-up. I was not successful so far, but have not tried very hard either. I have two types of applications: apps with a huge amount of data and multiple tables and those that have fewer data and ideally just a single table. For the latter, I can read the data from files and caching works like a charm. I read all data and cache it, then query the pandas data frame to extract data for plotting or tables.

I keep larger datasets in a database and query the data for every operation. You can see an example here. I find it personally much easier to query the data using SQL (I am more familiar with it than pandas) and I find it very fast too (if you want to try the application, that is still under construction. If you do, you should go a little down in the sidebar and select calcium, chloride or sodium as the y-parameter, otherwise some plots will stay blank. Anyways, for now nearly every click in the application changes a data filter, so caching would not add any speed. However, in the next step, I want to allow the user to change plot settings (size, markers, colors etc.). This is when caching will come handy and I will give it another try.

1 Like

Hey @godot63 :wave:,

This is all helpful info, thanks for sharing :heart:. We’re now tracking this on this GitHub issue.