St.cache with sqlalchemy

Hello,

I find streamlit to be an incredible app developing library - thanks for the awesomeness!

I am developing a streamlit app which uses sqlalchemy. It turns out this creates a challenge for st.cache as I am thrown the common caching error (Streamlit cannot hash an object of type <class ‘sqlalchemy.engine.base.Engine’>).

In my search for a solution I have stumbled upon several posts that suggest using the “hash_funcs = {}” parameter in the st.cache. However, I can’t seem to understand the examples provided. Therefore I hope someone can lead me to understand how I should use the hash_funcs parameter in order to use sqlalchemy.

Fundamentally my python code is pretty simple - it looks something like this:


import streamlit as st
import pandas as pd
from sqlalchemy import create_engine
import time

conn = create_engine('some random connection string')

SQL_script = st.text_area(label='SQL Input', value='SELECT * FROM TableA')

@st.cache
def load_data():
    with st.spinner('Loading Data...'):
        time.sleep(0.5)
        df = pd.read_sql_query(SQL_script, conn)
    return df

raw_data = load_data()
raw_data

By looking at the example above, how should I use hash_funcs to overcome the cashing error?

Thanks for your time

1 Like

I don’t have a sqlalchemy example, but you can take a look at my postgresql example. Try placing your connection object within your load_data() and decorate with

@st.cache(allow_output_mutation=True)
1 Like

Just to expand on what @pybokeh proposed, something like this should work:

SQL_script = st.text_area(label='SQL Input', value='SELECT * FROM TableA')

@st.cache(allow_output_mutation=True)
def get_connection():
    return create_engine('some random connection string')

@st.cache
def load_data():
    with st.spinner('Loading Data...'):
        time.sleep(0.5)
        df = pd.read_sql_query(SQL_script, get_connection())
    return df

raw_data = load_data()
raw_data

Somewhat related, we’re working on several improvements to st.cache. Some are aimed at making caching more powerful / customizable, and some are aimed at making it easier to understand.

In that last category, we’ll likely be removing the ability for st.cache to “watch” variables that live outside of the cached function’s scope (see Github issue). In your case that’s the SQL_string variable.

So to future-proof your script, I recommend passing that variable as an explicit argument to load_data(), like this:

SQL_script = st.text_area(label='SQL Input', value='SELECT * FROM TableA')

@st.cache(allow_output_mutation=True)
def get_connection():
    return create_engine('some random connection string')

@st.cache
def load_data(SQL_script):
    with st.spinner('Loading Data...'):
        time.sleep(0.5)
        df = pd.read_sql_query(SQL_script, get_connection())
    return df

raw_data = load_data(SQL_script)
raw_data
1 Like

In my opinion, the query and connection URI should be the parameters of the function you’re trying to cache, and avoid using variables from the outer scope inside this function. Something like:

@st.cache
def load_data(query, uri):
    # initialize the engine here and do all the rest.