How to use st.cache with sqlalchemy.orm objects

Hi community!

I need some advice on how to use caching with sqlalchemy.orm.
I am writing an app for data exploration. The app uses sqlalchemy to query data from a database. Some of the queries can run for quite a bit of time but the result is reusable at different points in the app. Hence the need for caching.

I decorated my query function with @st.cache but I got a long list of unhashable types from sqlalchemy.orm. I tried to disable hashing for some of those but the list keeps on growing. Here are some examples:
UnhashableTypeError: Cannot hash object of type sqlalchemy.orm.relationships.RelationshipProperty.Comparator
UnhashableTypeError: Cannot hash object of type sqlalchemy.orm.properties.ColumnProperty

My function executes the query via pandas.read_sql() which returns a pandas.DataFrame with the result. This is also the return value of my function.

Does anyone have an idea?

Hi @simtsc, welcome to the forum !

Thanks this is interesting, I donā€™t remember we had people caching SQLAlchemy ORM queries on the forumā€¦ I guess you already tried @st.cache(allow_output_mutation=True) then as a workaround.

Are you able to produce a toy example with that pandas.read_sql and SQL queries that generate those hashing errors ?

Hi @andfanilo, thanks for your quick response. Indeed, I tried @st.cache(allow_output_mutation=True) but to no avail; the result was the same.

The following is a simple example of a query which raises the same exception:

    @st.cache
    def get_data(self, num: int = 1):
        session = self.get_session()
        mytable_query = session.query(MyTable.id).filter(MyTable.col0 == num)
        df = pd.read_sql(mytable_query.statement, session.bind)
        return df

Sorry, didnā€™t get the code formatting to work here, itā€™s a python function though

Hi, I ran into the same issue here:

I tried various solutions with no success as you can see in the post :frowning:
I did create a small example in the post, but I guess not small enough since no one answered.

Hi @amitg1 and @simtsc

Iā€™ve built a small fully reproducible example so anyone can try :

import pandas as pd
import sqlite3
import sqlalchemy
from sqlalchemy import create_engine, Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, Session
from sqlalchemy.orm.properties import ColumnProperty
from sqlalchemy.orm.query import Query
import streamlit as st


Base = declarative_base()
class Company(Base):
    __tablename__ = "company"
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    age = Column(Integer, nullable=False)


@st.cache(allow_output_mutation=True)
def get_session(path: str):
    e = create_engine(f'sqlite:///{path}')
    Session = sessionmaker(bind=e)
    session = Session()
    return session


def create_test_db(session: Session):
    """Should create test.db file with "company" table
    Maybe should delete test.db if rerunning the script
    """
    Base.metadata.create_all(session.bind)
    session.add(Company(name="Streamlit", age=42))
    session.add(Company(name="is", age=24))
    session.add(Company(name="awesome", age=3))
    session.commit()


def check_sqlite_table(path: str):
    """Use sqlite3 library to read test.db content. 
    Just to check table creation went well.
    """
    e = sqlite3.connect(path)
    sql = """
    SELECT * FROM company;
    """
    df = pd.read_sql(sql, con=e)
    st.dataframe(df)


@st.cache(hash_funcs={Session: id})
def run_orm_query(q: Query, session: Session):
    df = pd.read_sql(q.statement, session.bind)
    st.dataframe(df)


session: Session = get_session("test.db")
if st.button("Create test table"):
    create_test_db(session)
if st.checkbox("Show sqlite test table using raw sqlite3"):
    check_sqlite_table("test.db")
if st.checkbox("Show sqlite test table using ORM query"):
    q: Query = session.query(Company.id).filter(Company.name == "Streamlit")
    run_orm_query(q, session)

With that we get the sqlalchemy.orm.properties.ColumnProperty and also can experience a little bit the slowness of the caching.

Now we can play ā€¦ we want to use hash_funcs to map a python data type like this orm.query.Query to a particular value that is easily compared and stored.

For a SQL ORM query, my first idea is to compile and generate the SQL string statement, which looks possible, so when I get the same query multiple times, the generated query statement should be in the cache with the result and I should just retrieve its result. so the following works for me :

def serialize_orm_query(q: Query):
    """Build SQL statement with inlined parameters
    https://stackoverflow.com/questions/5631078/sqlalchemy-print-the-actual-query
    """
    return q.statement.compile(compile_kwargs={"literal_binds": True}).string

letā€™s use that function to compare ORM queries in input of function and in cache :

import sqlite3

import pandas as pd
import sqlalchemy
import streamlit as st
from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.query import Query
from sqlalchemy.sql.compiler import StrSQLCompiler

# ...

def serialize_orm_query(q: Query):
    """Build SQL statement with inlined parameters
    https://stackoverflow.com/questions/5631078/sqlalchemy-print-the-actual-query
    """
    return q.statement.compile(compile_kwargs={"literal_binds": True}).string


@st.cache(
    hash_funcs={
        Session: id,
        Query: serialize_orm_query,  # if we get a query, stringify it with inbound params
        StrSQLCompiler: lambda _: None,  # we don't really care about the compiler
    },
    suppress_st_warning=True,
)
def run_orm_query(q: Query, session: Session):
    st.warning("CACHE MISS")
    return pd.read_sql(q.statement, session.bind)


session: Session = get_session("test.db")
if st.button("Create test table"):
    create_test_db(session)
if st.checkbox("Show sqlite test table using raw sqlite3"):
    check_sqlite_table("test.db")
if st.checkbox("Show sqlite test table using ORM query"):
    param = st.text_input("Input name of company to look for :")
    q: Query = session.query(Company.age).filter(Company.name == param)
    st.write(serialize_orm_query(q))
    df = run_orm_query(q, session)
    st.dataframe(df)

Nowā€¦I donā€™t use a lot of ORM, so there may be other ways to serialize the Query statement, and maybe it doesnā€™t work for every bit of ORM function or custom functions, so Iā€™ll let you come up with more examples that break this (lie the select() instead of query() that I did not try but should be similar).

Also this is generic so maybe you need to inject your own dialect instead, this is pretty well explained in the SO answer.

1 Like

Works almost perfect :slight_smile:
only thing is this:

its cause by a class method in the table object:

    return Query(cls).filter(cls.test_id == test_id,
                             cls.event_date.between(start_date, end_date))

ill try to work around it

Thank you!!

ok, solved.
change the column type from sqlalchemy Date type to to sqlalchemy_utils import ArrowType and input to func is arrow.get(end_date)
caching is a bit slow but now its bearable.
thanks again.

1 Like