UnhashableType: Cannot hash object of type _thread._local

I keep getting the same error returned when I use st.cache before a function doing a complex query & pandas manipulation. The original function queries an API for json data twice, and then a SQL database, puts them in pandas, does a few manipulations, and then joins all 3 into 1 dataframe. Several of the columns contain lists, which seem like it might be the problemā€¦ But when I separated out the SQL query to itā€™s own function (that Iā€™m also attempting to cache) now itā€™s seems to error on the SQL function. Iā€™ve looked all over for documentation to resolve this and am having trouble finding the solution. Screen shot attached of the error.

Hello @ccoggins, welcome to the forums !

The documentation for Streamlit has just been updated with a page on Advanced caching, most notably a bit on caching a database connection :slight_smile:

Itā€™s hard to say without your original code, but my guess would be the code you are trying to cache which involves the SQL function opens a thread, and this thread cannot be hashed by Streamlit by default. You can configure the way Streamlit deals with caching this thread though, as a first try does it work with @st.cache(allow_output_mutation=True) ?

1 Like

Thanks! I just got to sit down with the new documentation, and I updated a few things that seem to have been beneficial to my streamlit dashboard. I had stumbled onto allow_output_mutation=True prior to my post, and it didnā€™t fix the issue. What I did do was separate the SQL engine connection into itā€™s own function and cache it with allow_output_mutation=True, and then passed that into the function that holds the SQL query and pandas manipulations. Even after that, the function containing the SQL query and pandas manipulations still returned the same error above.

What I also tried is turning off caching for this function, and the entire dashboard worked fine. I also tried passing in some of the arguments mentioned in the new documentation, and it didnā€™t resolve anything (some of the concepts are new to me, so itā€™s likely I was doing them wrong). When I tried hash_funcs={DBConnection: id}, it returned an error NameError: name 'DBConnection' is not defined. After changing DBConnection to my engine connection object, it returns the same error I originally posted about. Itā€™s probably a rookie mistake I canā€™t tell if DBConnection is a variable/object I should pass in, or an argument Streamlit looks for.

Hereā€™s the function Iā€™m having trouble caching, for context:

def get_songs(engine):
sql = ā€œā€"
SELECT *
FROM ā€œpublicā€.ā€œtableā€
LIMIT 10;
ā€œā€"
df = pd.read_sql(sql, con=engine)
df.id = df.id.astype(str)
df.col5 = [ā€™,ā€™.join(map(str, i)) for i in df.col5]
df.col5 = df.col5.replace({ā€œtext1ā€:ā€œtext1 editā€, ā€œtext2ā€:ā€œtext2 editā€}, regex=True)
# col2, col3, and col4 are list objects; col5 is a string
df[ā€˜col6ā€™] = (df.col2 + df.col3 + df.col4 + df.col5.apply(lambda x: [x])).map(list).apply(lambda x: sorted(x))
df = df[[ā€œidā€, ā€œcol6ā€]]
return df

Just to be sure, I did a minimal example with sqlite3 so we can debug together ! Here is what I understood from your problem :


load_sqlite.py

import sqlite3

conn = sqlite3.connect("test.db")
conn.execute(
    """CREATE TABLE COMPANY
         (ID INT PRIMARY KEY     NOT NULL,
         NAME           TEXT    NOT NULL,
         AGE            INT     NOT NULL,
         ADDRESS        CHAR(50),
         SALARY         REAL);"""
)
conn.execute(
    "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
      VALUES (1, 'Paul', 32, 'California', 20000.00 )"
)
conn.execute(
    "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
      VALUES (2, 'Allen', 25, 'Texas', 15000.00 )"
)
conn.execute(
    "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
      VALUES (3, 'Teddy', 23, 'Norway', 20000.00 )"
)
conn.execute(
    "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
      VALUES (4, 'Mark', 25, 'Rich-Mond ', 65000.00 )"
)
conn.commit()
print("Records created successfully")

conn.close()

You can run python load_sqlite.py to create a local test.db which contains a table.

app.py

import sqlite3
from sqlite3 import Connection

import pandas as pd
import streamlit as st


@st.cache(allow_output_mutation=True)
def get_connection(path):
    """Put the connection in cache to reuse if path does not change."""
    return sqlite3.connect(path)


@st.cache(hash_funcs={Connection: id})
def get_data(engine):
    """If the Connection.id is the same as before, use the cached dataframe"""
    sql = """
    SELECT * FROM COMPANY;
    """
    df = pd.read_sql(sql, con=engine)
    return df


engine = get_connection("test.db")
data = get_data(engine)
st.write(data)

Run with streamlit run app.py


on NameError: name 'DBConnection' is not defined, you need to import it in your Python script, itā€™s an object/class.

  • The @st.cache(hash_funcs={Connection: id}) means Streamlit, each time you come accross a Connection object in the cached method, when you compute the cache key, use id value of Connection and donā€™t try to use the rest of the Connection object as a cache key because it might change dynamically given other factors, like if thereā€™s a Thread object within the Connection that we donā€™t know how to hash. Freely replace Connection by DBConnection in your case :slight_smile:

In my app.py if I remove from sqlite3 import Connection I get NameError: name 'Connection' is not defined.

  • you need to look for this DBConnection depending on the SQL client library you are using under SQLAlchemy. I only looked for from psycopg2.engine import DBConnection for PostgreSQL and could not find it. What SQL client + SQL DB are you using ?
4 Likes

Wow, @andfanilo. This is such an amazing post, and that example you gave perfectly captures the new hash_funcs workflow for database connections.

2 Likes

@ccoggins: The suggestion from @andfanilo is exactly correct. Please let us know if hash_funcs={Connection: id} works for you. If so, we will consider making this the default behavior for sqlite3.Connection.

If this suggestion doesnā€™t work, please know that the next version of Streamlit will include additional information so that we can diagnose and fix this problem.

Thank you for using Streamlit. Weā€™re excited to get this use-case working for you! :balloon:

1 Like

Sorry for the delayed response! @andfanilo your explanation of hash functions is great.

Being fully transparent, I donā€™t remember when I did it, but I did get it working. Itā€™s still a little over my head so I could have it wrongā€¦but I believe Iā€™m mirroring your implementation using sqlalchemy. I Hereā€™s a toy version of my implementation:

import pandas as pd
import sqlalchemy

@st.cache(allow_output_mutation=True)
def connect_engine(url):
    engine = sqlalchemy.create_engine(url)
    return engine

st.cache(hash_funcs={sqlalchemy.engine.base.Engine: id})
def get_data(engine):
    sql = """
    SELECT * FROM table;
    """
    df = pd.read_sql(sql, con=engine)
    return df

connection_url = "db_connection_string"
engine = connect_engine(connection_url)
data = get_data(engine)
st.write(data)
3 Likes

Great, glad you got it to work :slight_smile:

Good :grinning: now I know I can reuse that example everywhere ! :airplane:

1 Like

Great, @ccoggins! This is a very common use-case and weā€™d like to simplify it further in future releases. In the meantime though, thatā€™s the exact right approach, and Iā€™m so glad this worked for you.

Thanks for the kind words, @andfanilo!

1 Like

Hey @ccoggins :wave:,

Wanted to give you a quick update that we created a new GitHub issue to update the default behavior Adrien was mentioning and create a hash_func for type _thread._local.

This has been bothering me for so long. Thanks for helping me solve my problem!

Thanks so much for complementing the super helpful @andfanilo example with sqlalchemy variant, between both of you really helped me move beyond the hash challenge of the db connection as a streamlit newbie. :heart:

Well almost a year later, let me mark this as the solution :slight_smile: happy Streamlit-ingā„¢ :balloon:

Just chime in another data point. In my case for sqlite3, it works after I replace

@st.cache(hash_funcs={DBConnection: id})

with

@st.cache(hash_funcs={sqlite3.Connection: id})

1 Like

@ccoggins. You saved my life! This unhashable error was killing me. My company, which shall remain nameless, to save it from public humiliation, is using ColdFusion to develop its web apps and I simply canā€™t abide by it! Streamlit is my only hope and you have helped keep hope alive, my friend!

For psycopg2 you can use
@st.cache(hash_funcs={psycopg2.extensions.connection: id})
Thanks! That worked for me as well

1 Like

For psycopg2 you can use -

import streamlit as st
import psycopg2
import pandas as pd

# Initialize connection.
# Uses st.experimental_singleton to only run once.
@st.experimental_singleton()
def get_connection():
    return psycopg2.connect(**st.secrets["postgres"])

@st.cache(hash_funcs={psycopg2.extensions.connection: id})
def get_data(engine):
    """If the Connection.id is the same as before, use the cached dataframe"""
    query = 'select "Email" from emails_only;'
    data = pd.read_sql(query, engine)
    return data

def main():
    engine = get_connection()
    data = get_data(engine)
    st.write(data)

if __name__ == "__main__":
    main()

Hope it helps.

1 Like