Streamlit cannot hash an object of type <class 'sqlalchemy.engine.base.Engine'>

I would like to cash the output of a function reading data from a database and returning a panda dataframe. I keep getting the message Streamlit cannot hash an object of type <class ‘sqlalchemy.engine.base.Engine’>. I tried the suggestion shown in the error message and added the argument allow_output_mutation=True to the st.cache decorator but it did not help. Can this be done or is it just not possible? I am on Windows 10 and Streamlit version 0.52.0. I am currently adapting my code to read most data from a database instead of csv files so I can better prefilter the data. However, if cashing is not possible with databases I would probably try another route.

@st.cache(allow_output_mutation=True)
def get_data_Collection_id(criteria):
    global sql_engine

    query = "Select * from v_data_collections"
    if criteria > '':
        query += ' where ' + criteria
    result = pd.read_sql_query(query, sql_engine)
    return result

Hi @godot63,

Sounds like the newly released hash_funcs would solve your problem.

https://streamlit.io/docs/api.html#streamlit.cache

hash_funcs ( dict or None ) – Mapping of types to hash functions. This is used to override the behavior of the hasher inside Streamlit’s caching mechanism: when the hasher encounters an object, it will first check to see if its type matches a key in this dict and, if so, will use the provided function to generate a hash for it. See below for an example of how this can be used.

@st.cache(hash_funcs={sqlalchemy.engine.base.Engine: id})
def get_data_Collection_id(criteria):

We’re actively working on new documentation around Caching to explain best practices around this feature.

thanks for your fast reply. I tried your suggestion but was not successful so far. I think I have not understood very well the concept around cashing and will have to wrap my head around it once the doc is updated. Looking forward to that! In the meantime I try to program data access more in a on-demand-style: instead of fetching all data when starting the app and then cashing it, I only fetch the data I need for the current menu item. Works fine so far. I’ll worry about fine tuning with cashing when the solution is in a more definitive state.

Hey @godot63 :wave:,

Good chance you’ve already seen the updated docs, but we wanted to update some of the older threads that documentation around advance caching and hash_funcs was released this month. Here are some helpful links:

If you come across any issues or would like more context, here is a helpful topic.

hi @tc1,
thanks for the heads-up. I was not successful so far, but have not tried very hard either. I have two types of applications: apps with a huge amount of data and multiple tables and those that have fewer data and ideally just a single table. For the latter, I can read the data from files and caching works like a charm. I read all data and cache it, then query the pandas data frame to extract data for plotting or tables.

I keep larger datasets in a database and query the data for every operation. You can see an example here. I find it personally much easier to query the data using SQL (I am more familiar with it than pandas) and I find it very fast too (if you want to try the application, that is still under construction. If you do, you should go a little down in the sidebar and select calcium, chloride or sodium as the y-parameter, otherwise some plots will stay blank. Anyways, for now nearly every click in the application changes a data filter, so caching would not add any speed. However, in the next step, I want to allow the user to change plot settings (size, markers, colors etc.). This is when caching will come handy and I will give it another try.

1 Like

Hey @godot63 :wave:,

This is all helpful info, thanks for sharing :heart:. We’re now tracking this on this GitHub issue.

Hey @godot63,

SQAlchemy_engine is now natively supported on 0.59.0 :+1:via nightly. Should be in a general release soon and we’ll update the thread when it is!

1 Like

Awesome news! Sorry, It took me a while to try this out since in the meantime I have organized my code so that caching is less important. However, now I have tried it and it works like a charm.

I see the following dilemma with accessing large datasets: if you read all data directly from files into pandas you end up clogging your memory very fast if there are multiple users. When using a database, you can load chunks of data required for each operation, however then caching is less important, since you generally load different data each time. However, there are lots of relatively static data such as lists and codes, that must be read as well to make the GUI and this data can be very well cached to make this process faster. Also, some changes in the GUI, such as changing plot dimensions or axis settings don’t require a data reload so caching can be effective here too. I am very glad this works now.

1 Like