How to use st.cache with sqlalchemy.orm objects

Hi community!

I need some advice on how to use caching with sqlalchemy.orm.
I am writing an app for data exploration. The app uses sqlalchemy to query data from a database. Some of the queries can run for quite a bit of time but the result is reusable at different points in the app. Hence the need for caching.

I decorated my query function with @st.cache but I got a long list of unhashable types from sqlalchemy.orm. I tried to disable hashing for some of those but the list keeps on growing. Here are some examples:
UnhashableTypeError: Cannot hash object of type sqlalchemy.orm.relationships.RelationshipProperty.Comparator
UnhashableTypeError: Cannot hash object of type sqlalchemy.orm.properties.ColumnProperty

My function executes the query via pandas.read_sql() which returns a pandas.DataFrame with the result. This is also the return value of my function.

Does anyone have an idea?

Hi @simtsc, welcome to the forum !

Thanks this is interesting, I don’t remember we had people caching SQLAlchemy ORM queries on the forum… I guess you already tried @st.cache(allow_output_mutation=True) then as a workaround.

Are you able to produce a toy example with that pandas.read_sql and SQL queries that generate those hashing errors ?

Hi @andfanilo, thanks for your quick response. Indeed, I tried @st.cache(allow_output_mutation=True) but to no avail; the result was the same.

The following is a simple example of a query which raises the same exception:

    @st.cache
    def get_data(self, num: int = 1):
        session = self.get_session()
        mytable_query = session.query(MyTable.id).filter(MyTable.col0 == num)
        df = pd.read_sql(mytable_query.statement, session.bind)
        return df

Sorry, didn’t get the code formatting to work here, it’s a python function though

Hi, I ran into the same issue here:

I tried various solutions with no success as you can see in the post :frowning:
I did create a small example in the post, but I guess not small enough since no one answered.

Hi @amitg1 and @simtsc

I’ve built a small fully reproducible example so anyone can try :

import pandas as pd
import sqlite3
import sqlalchemy
from sqlalchemy import create_engine, Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, Session
from sqlalchemy.orm.properties import ColumnProperty
from sqlalchemy.orm.query import Query
import streamlit as st


Base = declarative_base()
class Company(Base):
    __tablename__ = "company"
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    age = Column(Integer, nullable=False)


@st.cache(allow_output_mutation=True)
def get_session(path: str):
    e = create_engine(f'sqlite:///{path}')
    Session = sessionmaker(bind=e)
    session = Session()
    return session


def create_test_db(session: Session):
    """Should create test.db file with "company" table
    Maybe should delete test.db if rerunning the script
    """
    Base.metadata.create_all(session.bind)
    session.add(Company(name="Streamlit", age=42))
    session.add(Company(name="is", age=24))
    session.add(Company(name="awesome", age=3))
    session.commit()


def check_sqlite_table(path: str):
    """Use sqlite3 library to read test.db content. 
    Just to check table creation went well.
    """
    e = sqlite3.connect(path)
    sql = """
    SELECT * FROM company;
    """
    df = pd.read_sql(sql, con=e)
    st.dataframe(df)


@st.cache(hash_funcs={Session: id})
def run_orm_query(q: Query, session: Session):
    df = pd.read_sql(q.statement, session.bind)
    st.dataframe(df)


session: Session = get_session("test.db")
if st.button("Create test table"):
    create_test_db(session)
if st.checkbox("Show sqlite test table using raw sqlite3"):
    check_sqlite_table("test.db")
if st.checkbox("Show sqlite test table using ORM query"):
    q: Query = session.query(Company.id).filter(Company.name == "Streamlit")
    run_orm_query(q, session)

With that we get the sqlalchemy.orm.properties.ColumnProperty and also can experience a little bit the slowness of the caching.

Now we can play … we want to use hash_funcs to map a python data type like this orm.query.Query to a particular value that is easily compared and stored.

For a SQL ORM query, my first idea is to compile and generate the SQL string statement, which looks possible, so when I get the same query multiple times, the generated query statement should be in the cache with the result and I should just retrieve its result. so the following works for me :

def serialize_orm_query(q: Query):
    """Build SQL statement with inlined parameters
    https://stackoverflow.com/questions/5631078/sqlalchemy-print-the-actual-query
    """
    return q.statement.compile(compile_kwargs={"literal_binds": True}).string

let’s use that function to compare ORM queries in input of function and in cache :

import sqlite3

import pandas as pd
import sqlalchemy
import streamlit as st
from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.query import Query
from sqlalchemy.sql.compiler import StrSQLCompiler

# ...

def serialize_orm_query(q: Query):
    """Build SQL statement with inlined parameters
    https://stackoverflow.com/questions/5631078/sqlalchemy-print-the-actual-query
    """
    return q.statement.compile(compile_kwargs={"literal_binds": True}).string


@st.cache(
    hash_funcs={
        Session: id,
        Query: serialize_orm_query,  # if we get a query, stringify it with inbound params
        StrSQLCompiler: lambda _: None,  # we don't really care about the compiler
    },
    suppress_st_warning=True,
)
def run_orm_query(q: Query, session: Session):
    st.warning("CACHE MISS")
    return pd.read_sql(q.statement, session.bind)


session: Session = get_session("test.db")
if st.button("Create test table"):
    create_test_db(session)
if st.checkbox("Show sqlite test table using raw sqlite3"):
    check_sqlite_table("test.db")
if st.checkbox("Show sqlite test table using ORM query"):
    param = st.text_input("Input name of company to look for :")
    q: Query = session.query(Company.age).filter(Company.name == param)
    st.write(serialize_orm_query(q))
    df = run_orm_query(q, session)
    st.dataframe(df)

Now…I don’t use a lot of ORM, so there may be other ways to serialize the Query statement, and maybe it doesn’t work for every bit of ORM function or custom functions, so I’ll let you come up with more examples that break this (lie the select() instead of query() that I did not try but should be similar).

Also this is generic so maybe you need to inject your own dialect instead, this is pretty well explained in the SO answer.

Works almost perfect :slight_smile:
only thing is this:

its cause by a class method in the table object:

    return Query(cls).filter(cls.test_id == test_id,
                             cls.event_date.between(start_date, end_date))

ill try to work around it

Thank you!!

ok, solved.
change the column type from sqlalchemy Date type to to sqlalchemy_utils import ArrowType and input to func is arrow.get(end_date)
caching is a bit slow but now its bearable.
thanks again.

1 Like