Invalidating and rebuilding caches during the night without user interaction

hitbyfrozenfire · November 22, 2023, 10:35am

Hi, currently we run a streamlit v1.28 and python 3.11 based tool on our company’s local OpenShift platform. Our tool allows users to work with a large dataset that is cached with st.cache_data. Once the data is cached, the performance is fine but the initial load of the data takes several minutes (and only loading part of the data unfortunately is not an option).

Once a day the cache is invalidated (ttl is set to ‘24h’) in order to reload the dataset as it will most likely have changed significantly on the db and users need that latest state. Currently the rebuilding of the cache happens each day when the first user interacts with the tool, which results in them having to wait for a few minutes until they can use it.

Now my question: is it possible to configure Streamlit in a way to trigger that cache invalidation and – that’s the important bit – reloading of cached data say during the night without any user having to interact with the UI? If not, does anybody have a best practice solution how this can be achieved otherwise? Is it possible to e.g. fire a corresponding API call against the streamlit server’s URL?

Thank you

tonykip · November 22, 2023, 3:30pm

Hi @hitbyfrozenfire,

Thanks for posting!

I think, theoretically, you can use a cron job in OpenShift to run the data-loading script after the cache expires at midnight. This could be an interesting idea to try. If you can share some dummy code that mirrors your current implementation, that would be great for us to hack around a solution as well.

hitbyfrozenfire · November 29, 2023, 9:43am

Thanks for your reply @tonykip

Locally (i.e. the streamlit server was http://localhost:8501) I tried running a get call against that server with requests.get which didn’t work as Streamlit pages are dynamically created (by JavaScript or TypeScript I assume).

Then I tried emulating a user accessing the app by making a “headless browser” get call using the selenium package combined with chrome/chromedriver (version numbers must match exactly) or firefox/geckodriver (here seems to be a bit more tolerance wrt versioning) in the background. This worked, the caches got invalidated and the data was reloaded

The python code for this is rather simple, here for Firefox (Chrome works analogously):

from selenium import webdriver
import time

service = webdriver.FirefoxService(exceutable_path='[path_to_geckodriver]')
options = webdriver.FirefoxOptions()
options.add_argument('--headless')
with webdriver.Firefox(service=service, options=options) as driver:
    driver.get('[url_of_your_streamlit_app]')
    time.sleep(3) # time required depends on how long data load takes

Maybe the time.sleep could be replaced by a command that actually can check if the website has fully loaded but I haven’t figured that out yet. Also I haven’t been able to build a docker image or try it on openshift yet, but from that experience any such get call that can deal with dynamically created websites (guess something like pyppeteer or playwright should also work then but haven’t tried them) that can be fired towards the streamlit server seems to do the trick at least as a workaround hack.

tonykip · November 29, 2023, 9:18pm

Oh cool. I think you’re mich further in trying to get a better implementation than I am at the moment.

Let me know if you find an optimal way.

system · May 27, 2024, 9:18pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setting cache TTL Using Streamlit	3	1039	August 12, 2023
Refresh cache daily Using Streamlit cache	8	4522	August 15, 2022
Clear cache Using Streamlit	2	1736	December 21, 2022
How long do cached data live if you do not set a TTL? Using Streamlit	6	989	December 24, 2023
Caching update/reset at a particular time interval Show the Community! cache , streamlit-cloud	4	89	May 1, 2025

Invalidating and rebuilding caches during the night without user interaction

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies