@st.cache_data VS @st.cache_resource - small issues

TomJohn · February 13, 2023, 4:43pm

Summary

Hi! As streamlit 1.18.1 is released, it’s time to move on and use new caching functions. This should be simple and smooth, but I ran into some troubles:

Official tutorial Connect Streamlit to a public Google Sheet now contains @st.cache_data, however using it causes an error:

UnserializableReturnValueError: Cannot serialize the return value (of type list) in run_query(). st.experimental_memo uses pickle to serialize the function’s return value and safely store it in the cache without mutating the original object. Please convert the return value to a pickle-serializable type. If you want to cache unserializable objects such as database connections or Tensorflow sessions, use st.experimental_singleton instead (see our docs for differences).

@st.cache_resource works ok. Live example: https://flashcards.streamlit.app/

Both @st.cache_data and @st.cache_resource work well for pandas read_csv:

@st.cache_data
def fetch_data(level_name):
    df = pd.read_csv(level_name, sep=",", header=None)
    return df

What is the benefit of using st.cache_data? There are moments when I have a feeling that app with “@st.cache_resource” work better (this, however, may be only my impression). Live example: https://dungeon.streamlit.app/

jcarroll · February 13, 2023, 6:19pm

Thanks for reporting this @TomJohn. Folks on Streamlit engineering are looking into it now.

jcarroll · February 13, 2023, 6:28pm

Oh, to your #2 question - For df = read_csv() unless you have a VERY large data set, it’s definitely more canonical to use st.cache_data(). One of the main reasons is that for cache_resource, any mutation to the function output (like a column transform, or add/edit/remove data) is persisted across app runs and across sessions. For a lot of use cases this is not desired. With cache_data, the function result for a given input is cached and a new, clean copy is provided on every run.

Does this make sense? Some more info at Caching - Streamlit Docs and we have a blog coming out about it tomorrow too.

TomJohn · February 13, 2023, 6:36pm

Hi @jcarroll Thank you! I think it’s a bit clearer now. In “The Dungeon,” I always want to load level design without any changes, so using “st.cache_data()” is a good choice.

Wally · February 13, 2023, 6:58pm

Honestly, I am not sure about this. Assuming you have some kind of “default level data” that comes shipped with your app, as in: Every user of the app will use this data, I’d argue you’d also be good with using st.cache_resource.
The advantage is that st.cache_resource will not create copies of the same object across sessions. However, you as the developer would have to make sure, that the data is not mutated as @jcarroll already pointed out.

Correct me if I’m wrong guys.

Edit: Even though the use case is different, what I meant is similar to this section in the docs: Caching - Streamlit Docs

TomJohn · February 13, 2023, 7:37pm

Interesting points. Thank you @Wally! I certainly must test what would happen if I used st.cache_resource and

User 1 would trigger fetch_data("level1.csv")
User 2 would trigger fetch_data("level2.csv")

…assuming that I will add new levels soon

jcarroll · February 14, 2023, 2:48am

@TomJohn for the first issue with the Google Sheet example - I found that this was due to gsheetsdb Rows object which is returned being not serializable. I tested a few other DB API implementations (Postgresql, SQLite) and they did not have this issue. It also seems like gsheetsdb is a bit stale so maybe not the best to use in our example.

I pulled down your flashcards app and was able to get it working with a much simpler pandas pd.read_csv() approach

import pandas as pd

@st.cache_data(ttl=600)
def load_data(sheets_url):
    csv_url = sheets_url.replace('/edit#gid=', '/export?format=csv&gid=')
    return pd.read_csv(csv_url)


# ok let's load the data
questions_df = load_data(st.secrets["public_gsheets_url"])

With this approach you retrieve the values a little differently but it’s pretty close. Use questions_df.iloc[st.session_state.q_no].Question instead of rows[st.session_state.q_no].Question, for example.

I filed a bug to fix the tutorial: Example code in public google sheets tutorial is broken · Issue #589 · streamlit/docs · GitHub

TomJohn · February 14, 2023, 6:34pm

Hi! @jcarroll thanks! Definitely exceeding expectations

Wally · February 16, 2023, 12:55am

Regarding gsheetdb: It seems to be deprecated and was superseded by shillelagh : GitHub - betodealmeida/shillelagh: Making it easy to query APIs via SQL

Not sure if shillelagh solves the problem though. But might be worthwhile to update the example in the docs.

jcarroll · February 16, 2023, 3:09am

Thanks! I saw this and spent a few minutes trying to install shillelagh in the example app and was unable to get it working - seems like it installed many more dependencies and made the install / usage more complex. Since there was a quick solution with no new dependencies required using pandas, I proposed we just update the example to use that.

snehankekre · February 16, 2023, 9:52am

Thanks, all! There’s a PR out to fix the issue in the public google sheets tutorial We still have to update the private google sheets tutorial to use a gsheetsdb alternative. If you have suggestions in addition to shiellelagh, please let me know

Kareem_Rasheed_babat · February 17, 2023, 3:30pm

I just updated my code with the caching. st.data_cache

system · February 17, 2024, 3:30pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Introducing two new caching commands to replace st.cache! Official Announcements	2	1371	February 17, 2024
@ st.cache_data vs @st.cache_resource Using Streamlit	1	303	April 26, 2024
Functools.cache versus st.cache_data Using Streamlit cache	7	1654	August 3, 2023
Streamlit cache_resource/data as standalone package, please Using Streamlit	4	483	December 23, 2023
Using cache_resource() for large dataframes Using Streamlit cache	2	626	January 27, 2024

@st.cache_data VS @st.cache_resource - small issues

Summary

Related topics