Using cache functionality + hashing

georgi · December 26, 2022, 9:06pm

Summary

I created a function to read a csv. That csv is update with no defined frequency, and I would like to updated it only if the last modified date of that csv has changed.

Steps to reproduce

So far I got something like this:

def get_last_modified(bucket):
    s3 = S3FileSystem(anon=False)
    last_modified = s3.modified(bucket)
    return last_modified


@st.cache(hash_funcs={StringIO: get_last_modified})
def load_data(bucket):
    df = pd.read_parquet(bucket)
    return df

Expected behavior:

Not sure how to do it, but I would like the load_data function to run when the last modified date is updated

Actual behavior:

I get the last modified date correctly, but I cannot make the load_data function to rerun

Debug info

Streamlit version: 1.11.0
Python version: 3.9
Using Conda

Any help will be apreciated, thanks in advanced

Goyo · December 26, 2022, 10:33pm

How are you calling load_data? I would expect bucket to be a str, but then defining a hash function for StringIO would do nothing.

I think just passing last_modified as a parameter to load_data should work.

georgi · December 27, 2022, 3:25pm

Hi Goyo! Thanks for the answering, it worked! I added “last_modified” as a parameter to the load function. I even deleted the hash function as u stated. This is extremely weird (and beautifully easy fortunately). Then all I need to rerun the function load_data is the parameter (last_modified) that states if it has tu be rerun, like this:

@st.cache()
def load_data(bucket, last_modified):
    df = pd.read_csv(bucket + "data.csv")
    return df

df = load_data(bucket, last_modified)

I don’t quietly understand how it works, though!

Goyo · December 27, 2022, 4:56pm

The data will be stored in the cache associated to the values of bucket and last_modified. So both values will be used to decide whether there is a cache hit or a cache miss, even if the function only needs one of them.

georgi · December 27, 2022, 5:46pm

Excellent, that would make it. Thanks a lot!

system · December 27, 2023, 5:46pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to refresh cache when a file loaded from a url is updated? Using Streamlit cache	3	1755	August 13, 2021
Cashing a function that reads different CSV's Using Streamlit cache	2	362	April 2, 2024
Cache with dictionnary and custom object Using Streamlit cache	4	1613	November 19, 2021
How to hash only string input function parameters Using Streamlit cache	2	1065	November 19, 2021
Cache_data with function to load gif files Using Streamlit cache	3	507	June 9, 2023

Using cache functionality + hashing

Summary

Steps to reproduce

Debug info

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies