Hi,
I have a function that runs a SQL query and stores the output of this query in a pandas dataframe.
The dataframe can get very large (100’s of MBs).
How do i share this dataframe across all of the users who visit the website, without having each of the users load this data into memory? I don’t expect to have many concurrent users, just a lot of users over time.
According to Experimental cache primitives - Streamlit Docs , i should not use st.experimental_singleton to store data. But how else do i have this be in a global session state?
Example code
@st.experimental_memo
def load_data(_role_session: Session) -> pd.DataFrame:
"""
:param snapshot_date: Date to pull data
:param _role_session: Assumed role
:return:
"""
df = query_athena(_role_session)
df['time_col'] = pd.to_datetime(df['time_col'])
# other df computations here
return df
def main():
st.title("Visualizing Data")
if "data" not in st.session_state:
logger.info("Loading data for the first time")
st.session_state["data"] = load_data(role_session)
logger.info("Data loading has been completed!")
data = st.session_state["data"]
# Do stuff with data
Should I decorate the load_data function with st.singleton? How do I access this from the “global” streamlit session state?
All of the changes to the pandas df will be within the load_data function - everything else is just plotting stuff that is in the dataframe