Caching database connection and passing it along

Following the docs on advanced caching with Streamlit, as to how create database connections and pass them along other functions, I’m using sqlite3 and trying to create one connection at run and the pass the ‘cursor’, by doing:

import streamlit as st

# create database connection
@st.cache(allow_output_mutation=True)
def get_database_connection():
    conn = sqlite3.connect('collect/Data/fpl.db')
    c = conn.cursor()   
    return c

Here I try to pass my connection id via hash_funcs , returning the cursor created:

# pass connections around
@st.cache(allow_output_mutation=True, hash_funcs={sqlite3.Cursor:id})
def get_teams_basic(c):
    df_teams = sql('SELECT * FROM TeamsBasic', c)
    return df_teams

@st.cache(allow_output_mutation=True, hash_funcs={sqlite3.Cursor:id})
def get_players_basic(c):   
    df_player_basic = sql('SELECT * FROM PlayersBasic', c)
    return df_player_basic

# and so on...

This is my sql() :

def sql(query, cursor):
    '''
    Takes an SQL query string, and outputs a
    dataframe representation of the query result.
    '''
    # Execute the sql query
    cursor.execute(query)

    # Get the query into a dataframe and set columns
    df_temp = pd.DataFrame(cursor.fetchall())
    df_temp.columns = [x[0] for x in cursor.description]

    # Set the sql id as the dataframe index
    index_column = df_temp.columns[0]
    df_temp.set_index(index_column, drop=True, inplace=True)

    return df_temp

Then, at main() , I instantiate the database connection and pass the cursor along to my functions:

def main():
    c = get_database_connection()
    teams(c)
    players(c)

if __name__ == '__main__':
  main()

But when code reaches the second connection, players() I get the following error:

ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 123145444151296 and this is thread id 123145454661632.

Traceback:

File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/streamlit/ScriptRunner.py", line 322, in _run_script
    exec(code, module.__dict__)
File "/Volumes/Dados/Documents/Code/Apps/app.py", line 280, in <module>
    main()
File "/Volumes/Dados/Documents/Code/Apps/app.py", line 273, in main
    managers(c)
File "/Volumes/Dados/Documents/Code/Apps/app.py", line 149, in managers
    df_players = get_players_basic(c)
File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/streamlit/caching.py", line 594, in wrapped_func
    return get_or_create_cached_value()
File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/streamlit/caching.py", line 578, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
File "/Volumes/Dados/Documents/Code/Apps/app.py", line 64, in get_players_basic
    df_player_basic = sql('SELECT * FROM PlayersBasic', c)
File "/Volumes/Dados/Documents/Code/Apps/collect/PythonFunctions/sqlfunctions.py", line 111, in sql
    cursor.execute(query)

Can someone point me into the right direction here?

PS:
This is also an open question at StackOverflow at the link below:

SO question

1 Like

Hi @patalanov,

I’m having trouble reproducing. Could you provide a single code snippet that I can copy/paste and run to reproduce the error?

Regardless, there’s a check_same_thread arg to the connect method which one needs to be careful about using, but might work for you depending on your use case.

By default, check_same_thread is True and only the creating thread may use the connection. If set False , the returned connection may be shared across multiple threads. When using multiple threads with the same connection writing operations should be serialized by the user to avoid data corruption.

However, given that streamlit runs each session in a new thread, perhaps the best option is to create a fresh database connection for each session. If sqlite has some connection pooling it could help to mitigate the overhead of doing so. I’m not so familiar with concurrency in sqlite. You could use SessionState to store the connection object.

Hello,

I’m fairly inexperienced when it comes to programming, is there a tutorial available to be able to use session state to create the multiple connections between streamlit apps? I opened two apps side by side and it looks like by default it uses the same session between users.

@st.cache(allow_output_mutation=True)
def get_connection():
    """
    Put the connection in cache to reuse if path does not change between Streamlit reruns.

    """
    return sqlite3.connect("./database/solar_projects.db", check_same_thread=False)