Cache_data or cache_resource?

Hey guys. Hope you are doing fine :slight_smile:

I’m trying to understand if I need st.cache_data or st.cache_resource. For the moment it seems to me that I need st.cache_data, but I’m not sure. Let me explain.

I’ve a module (data_analysis.py) that connects to my DB and analysis data. There are lots of rows in my DB (= lots of data to analyse). Data may (or may not) change every 10 seconds. My script can know if the DB will update or it wont, but it doesnt know when the DB updates. I can “predict” it’ll update in around 5 seconds? But it could be more, 7.5, 10, 12.5 (not more). Each method of data_analysis takes around a second to process the data of the DB, and tbh, it feels like a lot for some reason.

What’s the right way of working with these “expensive” computations? I wouldnt like to perform the same computation if its expected to get the same result.

PD: Since I can know if the DB will be updated or no, I think I could reuse (based on the parameters passed to the data_analysis method) what the data_analysis method returned every time. I saw every time cause whenever my DB changes, my script reruns, since it needs to show the user the updated data. Too much text, sorry. I hope you understood.

Hey @Ivan_Schuster,

If the database is constantly updating/changing, I’m afraid that caching isn’t going to help a ton for this use case. Caching is great for the case in which passing the same input to a function multiple times yields the same exact result (because we can put that result in the cache and avoid all the work of the actual function).

If passing the same input to a function multiple times yields different results each time (i.e. when your database has updated), you’d end up returning a cached value that isn’t accurate if the database has been updated since that value was cached.

What you could do is set your ttl to 5 seconds, for example, if you know that after 5 seconds, the database will update. Setting the ttl to 5 seconds would result in values being deleted from the cache if they have been there for longer than 5 seconds. You can learn more about this parameter in this doc.

Yup, seems like @cache_data was the way to go. Thanks!

Execution time of get_combinations: 0.059890 seconds
Execution time of get_combinations: 0.052372 seconds
Execution time of get_combinations: 0.056846 seconds
Execution time of get_combinations: 0.058014 seconds
Execution time of get_combinations: 0.068260 seconds
Execution time of get_combinations: 0.055948 seconds
Execution time of get_combinations: 0.058843 seconds
Execution time of get_combinations: 0.066333 seconds
Execution time of get_combinations: 0.051861 seconds
Execution time of get_combinations: 0.064797 seconds
Execution time of get_combinations: 0.053856 seconds
Execution time of get_combinations: 0.045447 seconds

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.