I created a function to read a csv. That csv is update with no defined frequency, and I would like to updated it only if the last modified date of that csv has changed.
Steps to reproduce
So far I got something like this:
s3 = S3FileSystem(anon=False)
last_modified = s3.modified(bucket)
df = pd.read_parquet(bucket)
Not sure how to do it, but I would like the load_data function to run when the last modified date is updated
I get the last modified date correctly, but I cannot make the load_data function to rerun
- Streamlit version: 1.11.0
- Python version: 3.9
- Using Conda
Any help will be apreciated, thanks in advanced
How are you calling
load_data? I would expect
bucket to be a
str, but then defining a hash function for
StringIO would do nothing.
I think just passing
last_modified as a parameter to
load_data should work.
Hi Goyo! Thanks for the answering, it worked! I added “last_modified” as a parameter to the load function. I even deleted the hash function as u stated. This is extremely weird (and beautifully easy fortunately). Then all I need to rerun the function load_data is the parameter (last_modified) that states if it has tu be rerun, like this:
def load_data(bucket, last_modified):
df = pd.read_csv(bucket + "data.csv")
df = load_data(bucket, last_modified)
I don’t quietly understand how it works, though!
The data will be stored in the cache associated to the values of
last_modified. So both values will be used to decide whether there is a cache hit or a cache miss, even if the function only needs one of them.
Excellent, that would make it. Thanks a lot!