Hi! As streamlit 1.18.1 is released, it’s time to move on and use new caching functions. This should be simple and smooth, but I ran into some troubles:
UnserializableReturnValueError: Cannot serialize the return value (of type list) in run_query(). st.experimental_memo uses pickle to serialize the function’s return value and safely store it in the cache without mutating the original object. Please convert the return value to a pickle-serializable type. If you want to cache unserializable objects such as database connections or Tensorflow sessions, use st.experimental_singleton instead (see our docs for differences).
What is the benefit of using st.cache_data? There are moments when I have a feeling that app with “@st.cache_resource” work better (this, however, may be only my impression). Live example: https://dungeon.streamlit.app/
Oh, to your #2 question - For df = read_csv() unless you have a VERY large data set, it’s definitely more canonical to use st.cache_data(). One of the main reasons is that for cache_resource, any mutation to the function output (like a column transform, or add/edit/remove data) is persisted across app runs and across sessions. For a lot of use cases this is not desired. With cache_data, the function result for a given input is cached and a new, clean copy is provided on every run.
Does this make sense? Some more info at Caching - Streamlit Docs and we have a blog coming out about it tomorrow too.
Hi @jcarroll Thank you! I think it’s a bit clearer now. In “The Dungeon,” I always want to load level design without any changes, so using “st.cache_data()” is a good choice.
Honestly, I am not sure about this. Assuming you have some kind of “default level data” that comes shipped with your app, as in: Every user of the app will use this data, I’d argue you’d also be good with using st.cache_resource.
The advantage is that st.cache_resource will not create copies of the same object across sessions. However, you as the developer would have to make sure, that the data is not mutated as @jcarroll already pointed out.
Correct me if I’m wrong guys.
Edit: Even though the use case is different, what I meant is similar to this section in the docs: Caching - Streamlit Docs
@TomJohn for the first issue with the Google Sheet example - I found that this was due to gsheetsdb Rows object which is returned being not serializable. I tested a few other DB API implementations (Postgresql, SQLite) and they did not have this issue. It also seems like gsheetsdb is a bit stale so maybe not the best to use in our example.
I pulled down your flashcards app and was able to get it working with a much simpler pandas pd.read_csv() approach
import pandas as pd
@st.cache_data(ttl=600)
def load_data(sheets_url):
csv_url = sheets_url.replace('/edit#gid=', '/export?format=csv&gid=')
return pd.read_csv(csv_url)
# ok let's load the data
questions_df = load_data(st.secrets["public_gsheets_url"])
With this approach you retrieve the values a little differently but it’s pretty close. Use questions_df.iloc[st.session_state.q_no].Question instead of rows[st.session_state.q_no].Question, for example.
Thanks! I saw this and spent a few minutes trying to install shillelagh in the example app and was unable to get it working - seems like it installed many more dependencies and made the install / usage more complex. Since there was a quick solution with no new dependencies required using pandas, I proposed we just update the example to use that.
Thanks, all! There’s a PR out to fix the issue in the public google sheets tutorial We still have to update the private google sheets tutorial to use a gsheetsdb alternative. If you have suggestions in addition to shiellelagh, please let me know