Using `st.cache()` with `CachedObjectMutationWarning:`

Hello! I am using st.cache() when pulling data from the web and it is making my app perform 100x better than it did without it. I am using the following function to get data from the web:

@st.cache
def get_data():

    col_list = [*list of columns here*]

    data = pd.read_csv('url', low_memory=False, usecols=col_list)

    return data

data = get_data()

When I run this, I get the warning:

CachedObjectMutationWarning: Return value of get_data() was mutated between runs.

By default, Streamlit's cache should be treated as immutable, or it may behave in unexpected ways. You received this warning because Streamlit detected that an object returned by get_data() was mutated outside of get_data().

The warning also suggests I use @st.cache(allow_output_mutation=True) to allow this, however, I donโ€™t want my data to be messed up somehow.

The reason I believe I am getting this warning is because I am doing stuff (mainly pandas functions) to this data after pulling based on certain user inputs. I just want to make sure if this is okay and I am at no risk of messing up my data somewhere? The app works wonderfully when I use @st.cache() but not sure if it is worth the potential errors/incorrectness it may cause in my data? I even tried creating a copy of the fetched data like so

...
data = get_data()
data1 = data.copy()

But I still get the warning. If I use @st.cache(allow_output_mutation=True), is it okay if I am altering my data after pulling it? If not, how are supposed to alter data we pull from the web without messing it up?

To be clear, I just want to pull this data once and store it in a cache, and then based on user input, do certain things to it. Without this cache decorator, it seems like the app is pulling the data after every user input, which takes a couple seconds and isnโ€™t desired.

Thanks!

Hey @bismo,

When you said your pulling data from the web what do you mean? I ask because when I read this, i have an inkling that each time you โ€œpull dataโ€ from the web it has the potential to change (depending on what your scraping).

The way @st.cache works is that it remembers the output of the function your running so that you donโ€™t have to actually run the function again. It seems that when you read this csv file from that url the data itself is different each time, and itโ€™s throwing you this error.

I wouldnโ€™t expect you doing mutations on the data further down in your app to cause this to occur.

Can you check that nothing is changing from this website somehow?

Thanks!
Marisa

Hey @Marisa_Smith! I am pulling data from two links actually. They are CSV files from a github repository. One is updated by the host weekly, the other daily, so the data does indeed change with time as new data is added to the files (which I am aware of and expect). Is that okay? Sorry for the confusing terminology in my initial post :stuck_out_tongue:

EDIT: Just some more info, one of the files is updated overnight, so not often between running the app is the data different.

Thanks!