Background cache refresh to avoid users waiting

We are running some slow queries with the results cached for 24 hours. How can we run some kind of background process to call the cached query function every hour, so that the cache is more likely to be fresh when a user loads the dashboard?

We can’t just curl localhost because the dashboard doesn’t work unless JS is enabled. Is there a command like streamlit run that executes a module to refresh the cache?

Hey @MikeHowells,

we don’t have a built-in solution for this right now but we have it on our roadmap. I also haven’t seen a good hack for this yet. Maybe you might be able to use the streamlit-autorefresh component by @kmcgrady to hack something together. If you do, please let me know, would love to see how it works!

1 Like

Please do share any workarounds you find! Also would love to hear from other community members – is this a key feature to you? How would you plan to use it?

Hi @MikeHowells

I was thinking in 3 possibilities, but I am unable to test or validate if any of them is valid:

  1. Use APScheduler. There are some options of how to configure in Flask Applications, but I am not sure how to adapt for Tornado framework+Streamlit.
  2. Use PeriodicCallback. Tornado has a method that can call an callback periodically based on schedule rule. The only problem is that I am not sure if is possible and how to define this configuration directly in Tornado Framework instead Streamlit.
  3. Use an hash in query or table property to avoid auto cache refresh: I dont’t know if you are using an simple database (like PostgreSQL) or cloud dataware (like BigQuery) and the requirements/limitations of your application. The option is to use st.cache(ttl=3600) decoration in a function, that get data and verify if the cache data is the same or has changed. The way to verify the status of database data could be done verifying if the hash of the table has changed or verifying if one table/cache paramter has changed. If the data has been changed, the function makes an query to get the new data and save the result in an local persistence or in memory. If the data has not been modified, the server cache function get the local persistence or memory data to renew the cache for more 1 hour. This aproach can make the first request after server cache expiration and database cache data modification slower, but could resolve your problem with less code complexity.

Personaly, if your query performance is not compromised in your database and is not a problem to wait only in the first request, the 3° option seems viable. This approach makes less unecessary requests in your database, because the server cache only will verify if the data has changed, the cache is expired and someone makes one request in your dashboard, good considering night periods and weekends that no one will use. But I dont know your limitations and requirements for this app

1 Like

This is complicated by the fact that we’re running Streamlit on our micros-services platform, which will auto-scale by creating/destroying instances to adapt to traffic load. The platform assumes that services are stateless so any instance can handle any request. So we might have several instances of our Streamlit app and front-end requests will be load balanced between them.

To warm up a cache we’d need to be able to use a Redis resource that’s shared by all instances, so that to the frontend it appears there’s a single cache regardless of auto-scaling in the backend.