Summary
I’m bulding an app that lets users get the schedule for a sports tournament. The data will be read only (i.e., no transformations are being made to it) and its gather from Google Sheets.
Right now I have a function to get the google sheets connection (using cache_resource()
), and then a function to get the data (using cache_data()
).
@st.cache_resource(show_spinner=False)
def get_google_sheet_connection():
logger.info("Getting google connection!")
gc = gspread.service_account_from_dict(credentials)
sh = gc.open('spreadsheet')
return sh
@st.cache_data(ttl=30, show_spinner=False)
def get_data():
<code here...>
return df
The problem is that I have a lot of concurrent users, so, when 50-100 people try to use this app at the same time, even though the connection is cached, get_data gets called from every new session (at least once), and that is using too many resources of the Google Cloud APIS.
Is it a good practice to use cache_resources()
for a dataframe, so I have a singleton of that dataset which is shared between all sessions, users and reruns?
@st.cache_resource(ttl=30, show_spinner=False)
def get_data():
<code here...>
return df
Thank you so much!