I’m bulding an app that lets users get the schedule for a sports tournament. The data will be read only (i.e., no transformations are being made to it) and its gather from Google Sheets.
Right now I have a function to get the google sheets connection (using
cache_resource()), and then a function to get the data (using
@st.cache_resource(show_spinner=False) def get_google_sheet_connection(): logger.info("Getting google connection!") gc = gspread.service_account_from_dict(credentials) sh = gc.open('spreadsheet') return sh @st.cache_data(ttl=30, show_spinner=False) def get_data(): <code here...> return df
The problem is that I have a lot of concurrent users, so, when 50-100 people try to use this app at the same time, even though the connection is cached, get_data gets called from every new session (at least once), and that is using too many resources of the Google Cloud APIS.
Is it a good practice to use
cache_resources() for a dataframe, so I have a singleton of that dataset which is shared between all sessions, users and reruns?
@st.cache_resource(ttl=30, show_spinner=False) def get_data(): <code here...> return df
Thank you so much!