Scaling streamlit to hundreds of users with heavy data object

I have a function that runs a SQL query and stores the output of this query in a pandas dataframe.
The dataframe can get very large (100’s of MBs).

How do i share this dataframe across all of the users who visit the website, without having each of the users load this data into memory? I don’t expect to have many concurrent users, just a lot of users over time.

According to Experimental cache primitives - Streamlit Docs , i should not use st.experimental_singleton to store data. But how else do i have this be in a global session state?

Example code

def load_data(_role_session: Session) -> pd.DataFrame:
    :param snapshot_date: Date to pull data
    :param _role_session: Assumed role
    df = query_athena(_role_session)
    df['time_col'] = pd.to_datetime(df['time_col'])
    # other df computations here
    return df

def main():
    st.title("Visualizing  Data")
    if "data" not in st.session_state:"Loading data for the first time")
        st.session_state["data"] = load_data(role_session)"Data loading has been completed!")

    data = st.session_state["data"]
    # Do stuff with data

Should I decorate the load_data function with st.singleton? How do I access this from the “global” streamlit session state?

All of the changes to the pandas df will be within the load_data function - everything else is just plotting stuff that is in the dataframe

Hi @AditSanghvi94 -

You can use st.experimental_memo as your code already has. If you want to cache the data for a bit of time, but then have it update periodically, you can use the ttl parameter to specify how long the data should be available before its refreshed.

Otherwise, caching is currently global, so once one user has loaded the data, every user will have access to the same data.


Great thanks!