Auto Refresh Streamlit Application With Many Pages

Hello! I have a streamline application that is deployed on a local machine that others can view when connected to the company VPN with the IP and port number such as 10.5.5.5:5050. The application has 1 main page and 30 pages. These pages run many queries that take a while to run. I use streamlit cache functionality, but these queries update 1 or 2 times a day so it isn’t always up to date. This makes for a poor user experience having to wait for many queries to complete.

How can I make it so the application refreshes/reloads automatically? I’d prefer to do it at a certain time such as middle of the night. Currently I am using streamlit autorefresh which works for the main page but not the sub pages. How can I get it to work on the sub pages?

Other things I have tried are using a runonsave trick such as described here with a cron job to edit the dummy file, but that didn’t work either. It seemed to refresh if I had the page open, but not in the middle of the night with it not opened. Also, I tried using selenium to open the pages, but that seemed to only work on the main page. The other pages would not open properly unless they were clicked from the sidebar in the main page.

Any other suggestions? I thought about using a timer, but didn’t want the application to be constantly running.

One solution, that requires other tools, is you could have the queries re-run separately from the app and saving the results to a DB, then loading from the DB when the app runs

For example, if you wanted the data to update a couple times a day, you could use google cloud - cloud run functions with a trigger, which will use their eventarc and pub/sub products too (one of their default trigger options, so not too hard to set up). The function could access the DB, run the query and save the results to another DB (such as BigQuery). Then the app can load the results from that DB. You add some time connecting to a DB, but then don’t have to run each query on the fly

I’ve done this in some cases where pages with live queries (+ python logic) would take a minute to 10 mins and the DB pull-version takes maybe 5-10 seconds

1 Like

Thanks for the suggestion! I think that could work, but would require a lot of rework at this time. I’d still prefer to auto refresh the pages if possible.

You may be able to create a cron job that uses a headless browser that would loop through and hit each of your Streamlit page URL’s hence prompting the caches to refresh. I havn’t tried this but am curious if it may work.

Otherwise, @msquaredds suggestion is the most robust and widely used solution. We use this approach with a DuckDB caching layer with an additional “Refresh Data Button” that initiates the data layer refresh so the user can get the most up to date data if they need it, otherwise its scheduled to update each morning.

1 Like

Understandable about not wanting the extra work and would be interested in how @cgage1 solution works too

If you do end up going with the DB version, just a couple additional design options:

  • You can have the queries re-run when the original database is updated, instead of on a schedule, if the data updates aren’t regularly scheduled
  • There’s a version that doesn’t require scheduling or checking data events, but also more load time once in a while. Basically when the app runs, you check the datetime of the latest data load and the datetime of the latest query save - if the query was run after the data load, use the results; otherwise, run the query, use the results and save them too. Then any other users will have less load time (until the data is updated again)
1 Like

Thanks for the suggestions! I’ll give the headless browser a try to see if that works. The DB version seems nice and might be an option in the future.

Is the streamlit autorefresh I linked earlier still a valid method for this? I see it hasn’t been updated in a while. Or is there a trick to get it to work on multiple pages?