Scaling streamlit to hundreds of users with heavy data object

AditSanghvi94 · February 4, 2022, 2:19am

Hi,
I have a function that runs a SQL query and stores the output of this query in a pandas dataframe.
The dataframe can get very large (100’s of MBs).

How do i share this dataframe across all of the users who visit the website, without having each of the users load this data into memory? I don’t expect to have many concurrent users, just a lot of users over time.

According to Experimental cache primitives - Streamlit Docs , i should not use st.experimental_singleton to store data. But how else do i have this be in a global session state?

Example code

@st.experimental_memo
def load_data(_role_session: Session) -> pd.DataFrame:
    """
    :param snapshot_date: Date to pull data
    :param _role_session: Assumed role
    :return:
    """
    df = query_athena(_role_session)
    df['time_col'] = pd.to_datetime(df['time_col'])
    # other df computations here
    return df

def main():
    st.title("Visualizing  Data")
    if "data" not in st.session_state:
        logger.info("Loading data for the first time")
        st.session_state["data"] = load_data(role_session)
        logger.info("Data loading has been completed!")

    data = st.session_state["data"]
    # Do stuff with data

Should I decorate the load_data function with st.singleton? How do I access this from the “global” streamlit session state?

All of the changes to the pandas df will be within the load_data function - everything else is just plotting stuff that is in the dataframe

randyzwitch · February 7, 2022, 1:58pm

Hi @AditSanghvi94 -

You can use st.experimental_memo as your code already has. If you want to cache the data for a bit of time, but then have it update periodically, you can use the ttl parameter to specify how long the data should be available before its refreshed.

Otherwise, caching is currently global, so once one user has loaded the data, every user will have access to the same data.

Best,
Randy

AditSanghvi94 · February 12, 2022, 1:58am

Great thanks!

system · February 12, 2023, 1:58am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Keep DataFrames objects between sessions in a thread-based program Using Streamlit	2	318	July 12, 2024
Is there a way to get data cache in multi page app Using Streamlit	3	878	May 9, 2024
Trying to understand Cache & Streamlitcloud vs. Local Version Community Cloud cache , streamlit-cloud	1	689	March 10, 2023
How to retain data in streamlit app built for multiple users and also avoid cross talk where user can upload files, process them and upload again? Using Streamlit cache , session-state , file-upload , pandas , discussion	4	393	September 26, 2024
Aggrid Editable Dataframe with Session State and Postgres Using Streamlit cache , session-state , database	1	999	December 5, 2023

Scaling streamlit to hundreds of users with heavy data object

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies