I have a function that runs a SQL query and stores the output of this query in a pandas dataframe.
The dataframe can get very large (100’s of MBs).
How do i share this dataframe across all of the users who visit the website, without having each of the users load this data into memory? I don’t expect to have many concurrent users, just a lot of users over time.
According to Experimental cache primitives - Streamlit Docs , i should not use st.experimental_singleton to store data. But how else do i have this be in a global session state?
@st.experimental_memo def load_data(_role_session: Session) -> pd.DataFrame: """ :param snapshot_date: Date to pull data :param _role_session: Assumed role :return: """ df = query_athena(_role_session) df['time_col'] = pd.to_datetime(df['time_col']) # other df computations here return df def main(): st.title("Visualizing Data") if "data" not in st.session_state: logger.info("Loading data for the first time") st.session_state["data"] = load_data(role_session) logger.info("Data loading has been completed!") data = st.session_state["data"] # Do stuff with data
Should I decorate the load_data function with st.singleton? How do I access this from the “global” streamlit session state?
All of the changes to the pandas df will be within the load_data function - everything else is just plotting stuff that is in the dataframe