Caching expensive database query with concurrent users

PeterT · September 29, 2021, 1:23pm

Hello everyone,

The setting

I have an application where Streamlit is querying a database for lots of data on a daily basis and caching these queries. Executing the query and reading data into memory takes ~20 minutes.

The issue

Now, let’s imagine that the cache just expired and 20 users simultaneously choose to load the app (could also be one clueless user trying to refresh the page again and again)… Since it takes 20 minutes to complete the function call which is cached, and thus 20 minutes to “populate the cache”, 20 identical queries will be made, and instead of running this function once, it will be run 20 times, which causes memory-overflow. I want to avoid this, but I’m not sure what the best approach is…

Possible solutions

In the ideal scenario, Streamlit’s caching functionality would check whether this function call is currently in process of being cached, and wait for this to be complete. Would the new caching primitive st.experimental_singleton() potentially solve this?
As soon as the function is hit, an entry is made in a dedicated table in the database indicating that this function call is currently being cached. The function should check this table in the database before executing, and if it finds that the value is currently being cached, it should provide an info message telling that data is currently being loaded (and then call st.stop()) and that the user should refresh the app to check if the data has been fully loaded.
When the function completes, the entry in the database indicating whether the cache is being populated should then be set as “completed” or something similar. Issues with this approach is that it’s some work to implement, and there needs to be some mechanisms in place if the app dies while in the middle of the function call, which would leave the entry in the database as forever “loading into cache”.
Other good ideas very welcome

Thanks a lot for any help

Kareem_Rasheed_babat · October 4, 2021, 4:29am

Great

manepal · March 2, 2022, 7:44am

@Kareem_Rasheed_babat
If you still have a need protecting a heavy computation from multiple users, we have currently implemented a “hack” for this in form of modifying streamlit/caching/cache_utils.py

gist.github.com

https://gist.github.com/Mane-Pal/91010fc874c9109f1a45f35777d12b56

cache_utils.py

# Copyright 2018-2022 Streamlit Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,

This file has been truncated. show original

The file in the gist is modified from streamlit v 1.6.0, and the modified section is from line 100-167.
We copy this file into the streamlit lib during a docker build.
If you use this for anything important i would suggest writing a test that checks the streamlit version number.

marduk · May 24, 2022, 6:58pm

Hi there @PeterT ,

Did you find a solution to this? I have the same challenge and wanted to know if your possible solutions (1) and (2) are finally feasible.

Also curious to see if anyone else has successfully dealt with concurrent users/open tabs (@manepal thanks for sharing yours - trying to avoid modifying the cache_utils.py file but will anyways review)

manepal · May 25, 2022, 4:52am

@marduk - Peter used to work at the same company as I am.
We currently use the modified cache_utils in our setup by simply copying it in to our container.
We have written a test to ensure we check for compatibility when we update the streamlit version.
The patch cache_utils works on versions1.6.0 to 1.9.0, we have not tested earlier versions of this.
We had another version of this on version ~ 0.89, before we updated to 1.6.0 and the new cache functions.
Hope this message helps

marduk · May 25, 2022, 12:44pm

Thanks so much @manepal for clarifying, appreciate it. I will give it a shot in the next few days (might reach out with questions, hope that’s ok ).

fdubinski · July 20, 2022, 12:02am

This is a great fix for a very real problem. Does streamlit plan on adopting this solution in a next version?

system · July 20, 2023, 12:02am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using st.cache Using Streamlit cache	2	542	August 4, 2023
Unexpected st.cache_data behavior (not caching) Using Streamlit cache , python-programming	6	1860	August 23, 2023
How long do cached data live if you do not set a TTL? Using Streamlit	6	996	December 24, 2023
Troubleshooting performance issues with multiple concurrent users Using Streamlit cache , discussion	1	183	April 26, 2025
Caching doesn't work with databases? Using Streamlit	3	381	August 15, 2022

Caching expensive database query with concurrent users

The setting

The issue

Possible solutions

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies