Refresh cache when Panda data file changes

I’m reading.caching a dataframe from a .pkl file and refreshing every hour using the ttl option:

def load_data(ttl=60*60):
    df = pd.read_pickle('./results/alldata.pkl')
    return df

Instead of TTL, is it possible to clear/reload the cache upon a file change event(either file size, date, last modified?). I have a separate script collecting data and overwriting the .pkl file at relatively random times.

You can use a different pattern.

Have two functions, the outer one is not cached. Written on a phone so forgive the capitalisation.

def load_data():
    Update_timestamp = get_file_change_time(fname)
    Return cached_data_load(update_timestamp)

Def cached_data_load(timestamp):
    df = ...
    Return of

The input argument changes when the timestamp changes, so it will not use a previously cached version. I put the TTL there still to hope that clears old cached data out. Not sure if there’s an LRU cache setting, that would be the preferred approach.

This worked perfectly! Thanks!

Any way it could be improved to refresh automatically when the timestamp is different (which implies some kid of a monitor of the timestamp running all the time…)?

Glad that worked for you :slight_smile: You might be able to do a kind of polling with a custom component to get the page to refresh, or maybe something funky with threads but I think it might go a little against how streamlit is setup.

Perhaps that’s a feature request, if maybe there was a component that had its own http endpoint you could manage this entirely outside and just post when the file had been updated.