Caching expensive database query with concurrent users

Hello everyone,

The setting

I have an application where Streamlit is querying a database for lots of data on a daily basis and caching these queries. Executing the query and reading data into memory takes ~20 minutes.

The issue

Now, let’s imagine that the cache just expired and 20 users simultaneously choose to load the app (could also be one clueless user trying to refresh the page again and again)… Since it takes 20 minutes to complete the function call which is cached, and thus 20 minutes to “populate the cache”, 20 identical queries will be made, and instead of running this function once, it will be run 20 times, which causes memory-overflow. I want to avoid this, but I’m not sure what the best approach is…

Possible solutions

  1. In the ideal scenario, Streamlit’s caching functionality would check whether this function call is currently in process of being cached, and wait for this to be complete. Would the new caching primitive st.experimental_singleton() potentially solve this?

  2. As soon as the function is hit, an entry is made in a dedicated table in the database indicating that this function call is currently being cached. The function should check this table in the database before executing, and if it finds that the value is currently being cached, it should provide an info message telling that data is currently being loaded (and then call st.stop()) and that the user should refresh the app to check if the data has been fully loaded.
    When the function completes, the entry in the database indicating whether the cache is being populated should then be set as “completed” or something similar. Issues with this approach is that it’s some work to implement, and there needs to be some mechanisms in place if the app dies while in the middle of the function call, which would leave the entry in the database as forever “loading into cache”.

  3. Other good ideas very welcome :nerd_face:

Thanks a lot for any help :slight_smile: