Using cache_resource() for large dataframes

Ricardo_Recarey · July 28, 2023, 11:05am

Summary

I’m bulding an app that lets users get the schedule for a sports tournament. The data will be read only (i.e., no transformations are being made to it) and its gather from Google Sheets.

Right now I have a function to get the google sheets connection (using cache_resource()), and then a function to get the data (using cache_data()).

@st.cache_resource(show_spinner=False)
def get_google_sheet_connection():
    
    logger.info("Getting google connection!")
    gc = gspread.service_account_from_dict(credentials)
    sh = gc.open('spreadsheet')
    return sh

@st.cache_data(ttl=30, show_spinner=False)
def get_data():
    <code here...>
    return df

The problem is that I have a lot of concurrent users, so, when 50-100 people try to use this app at the same time, even though the connection is cached, get_data gets called from every new session (at least once), and that is using too many resources of the Google Cloud APIS.

Is it a good practice to use cache_resources() for a dataframe, so I have a singleton of that dataset which is shared between all sessions, users and reruns?

@st.cache_resource(ttl=30, show_spinner=False)
def get_data():
    <code here...>
    return df

Thank you so much!

tonykip · July 31, 2023, 6:58pm

Hi @Ricardo_Recarey,

Thanks for posting!

You can definitely use st.cache_resource for large data because it is faster than st.cache_data (does not copy)…but you must ensure thread safety.

Also, multiple sessions mutating the cache concurrently can corrupt the data so beware of that. You can read more on this in our caching docs.

system · January 27, 2024, 6:58pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Functools.cache versus st.cache_data Using Streamlit cache	7	1522	August 3, 2023
Gspread and st.cache_data Using Streamlit	2	514	April 13, 2023
@st.cache_data VS @st.cache_resource - small issues Using Streamlit	12	5582	February 17, 2024
Memory limits on using cache_data Using Streamlit cache	2	2100	October 11, 2023
Using caching with API calls and messy DataFrames Using Streamlit cache , pandas	5	1251	November 19, 2021

Using cache_resource() for large dataframes

Summary

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies