UnhashableType: Cannot hash object of type _thread._local

I keep getting the same error returned when I use st.cache before a function doing a complex query & pandas manipulation. The original function queries an API for json data twice, and then a SQL database, puts them in pandas, does a few manipulations, and then joins all 3 into 1 dataframe. Several of the columns contain lists, which seem like it might be the problem… But when I separated out the SQL query to it’s own function (that I’m also attempting to cache) now it’s seems to error on the SQL function. I’ve looked all over for documentation to resolve this and am having trouble finding the solution. Screen shot attached of the error.

Hello @ccoggins, welcome to the forums !

The documentation for Streamlit has just been updated with a page on Advanced caching, most notably a bit on caching a database connection :slight_smile:

It’s hard to say without your original code, but my guess would be the code you are trying to cache which involves the SQL function opens a thread, and this thread cannot be hashed by Streamlit by default. You can configure the way Streamlit deals with caching this thread though, as a first try does it work with @st.cache(allow_output_mutation=True) ?

1 Like

Thanks! I just got to sit down with the new documentation, and I updated a few things that seem to have been beneficial to my streamlit dashboard. I had stumbled onto allow_output_mutation=True prior to my post, and it didn’t fix the issue. What I did do was separate the SQL engine connection into it’s own function and cache it with allow_output_mutation=True, and then passed that into the function that holds the SQL query and pandas manipulations. Even after that, the function containing the SQL query and pandas manipulations still returned the same error above.

What I also tried is turning off caching for this function, and the entire dashboard worked fine. I also tried passing in some of the arguments mentioned in the new documentation, and it didn’t resolve anything (some of the concepts are new to me, so it’s likely I was doing them wrong). When I tried hash_funcs={DBConnection: id}, it returned an error NameError: name 'DBConnection' is not defined. After changing DBConnection to my engine connection object, it returns the same error I originally posted about. It’s probably a rookie mistake I can’t tell if DBConnection is a variable/object I should pass in, or an argument Streamlit looks for.

Here’s the function I’m having trouble caching, for context:

def get_songs(engine):
sql = “”"
SELECT *
FROM “public”.“table”
LIMIT 10;
“”"
df = pd.read_sql(sql, con=engine)
df.id = df.id.astype(str)
df.col5 = [’,’.join(map(str, i)) for i in df.col5]
df.col5 = df.col5.replace({“text1”:“text1 edit”, “text2”:“text2 edit”}, regex=True)
# col2, col3, and col4 are list objects; col5 is a string
df[‘col6’] = (df.col2 + df.col3 + df.col4 + df.col5.apply(lambda x: [x])).map(list).apply(lambda x: sorted(x))
df = df[[“id”, “col6”]]
return df

Just to be sure, I did a minimal example with sqlite3 so we can debug together ! Here is what I understood from your problem :


load_sqlite.py

import sqlite3

conn = sqlite3.connect("test.db")
conn.execute(
    """CREATE TABLE COMPANY
         (ID INT PRIMARY KEY     NOT NULL,
         NAME           TEXT    NOT NULL,
         AGE            INT     NOT NULL,
         ADDRESS        CHAR(50),
         SALARY         REAL);"""
)
conn.execute(
    "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
      VALUES (1, 'Paul', 32, 'California', 20000.00 )"
)
conn.execute(
    "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
      VALUES (2, 'Allen', 25, 'Texas', 15000.00 )"
)
conn.execute(
    "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
      VALUES (3, 'Teddy', 23, 'Norway', 20000.00 )"
)
conn.execute(
    "INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \
      VALUES (4, 'Mark', 25, 'Rich-Mond ', 65000.00 )"
)
conn.commit()
print("Records created successfully")

conn.close()

You can run python load_sqlite.py to create a local test.db which contains a table.

app.py

import sqlite3
from sqlite3 import Connection

import pandas as pd
import streamlit as st


@st.cache(allow_output_mutation=True)
def get_connection(path):
    """Put the connection in cache to reuse if path does not change."""
    return sqlite3.connect(path)


@st.cache(hash_funcs={Connection: id})
def get_data(engine):
    """If the Connection.id is the same as before, use the cached dataframe"""
    sql = """
    SELECT * FROM COMPANY;
    """
    df = pd.read_sql(sql, con=engine)
    return df


engine = get_connection("test.db")
data = get_data(engine)
st.write(data)

Run with streamlit run app.py


on NameError: name 'DBConnection' is not defined, you need to import it in your Python script, it’s an object/class.

  • The @st.cache(hash_funcs={Connection: id}) means Streamlit, each time you come accross a Connection object in the cached method, when you compute the cache key, use id value of Connection and don’t try to use the rest of the Connection object as a cache key because it might change dynamically given other factors, like if there’s a Thread object within the Connection that we don’t know how to hash. Freely replace Connection by DBConnection in your case :slight_smile:

In my app.py if I remove from sqlite3 import Connection I get NameError: name 'Connection' is not defined.

  • you need to look for this DBConnection depending on the SQL client library you are using under SQLAlchemy. I only looked for from psycopg2.engine import DBConnection for PostgreSQL and could not find it. What SQL client + SQL DB are you using ?
2 Likes

Wow, @andfanilo. This is such an amazing post, and that example you gave perfectly captures the new hash_funcs workflow for database connections.

2 Likes

@ccoggins: The suggestion from @andfanilo is exactly correct. Please let us know if hash_funcs={Connection: id} works for you. If so, we will consider making this the default behavior for sqlite3.Connection.

If this suggestion doesn’t work, please know that the next version of Streamlit will include additional information so that we can diagnose and fix this problem.

Thank you for using Streamlit. We’re excited to get this use-case working for you! :balloon:

1 Like

Sorry for the delayed response! @andfanilo your explanation of hash functions is great.

Being fully transparent, I don’t remember when I did it, but I did get it working. It’s still a little over my head so I could have it wrong…but I believe I’m mirroring your implementation using sqlalchemy. I Here’s a toy version of my implementation:

import pandas as pd
import sqlalchemy

@st.cache(allow_output_mutation=True)
def connect_engine(url):
    engine = sqlalchemy.create_engine(url)
    return engine

st.cache(hash_funcs={sqlalchemy.engine.base.Engine: id})
def get_data(engine):
    sql = """
    SELECT * FROM table;
    """
    df = pd.read_sql(sql, con=engine)
    return df

connection_url = "db_connection_string"
engine = connect_engine(connection_url)
data = get_data(engine)
st.write(data)
1 Like

Great, glad you got it to work :slight_smile:

Good :grinning: now I know I can reuse that example everywhere ! :airplane:

1 Like

Great, @ccoggins! This is a very common use-case and we’d like to simplify it further in future releases. In the meantime though, that’s the exact right approach, and I’m so glad this worked for you.

Thanks for the kind words, @andfanilo!

1 Like