Refresh cache when Panda data file changes

I’m reading.caching a dataframe from a .pkl file and refreshing every hour using the ttl option:

@st.cache
def load_data(ttl=60*60):
    df = pd.read_pickle('./results/alldata.pkl')
    return df

Instead of TTL, is it possible to clear/reload the cache upon a file change event(either file size, date, last modified?). I have a separate script collecting data and overwriting the .pkl file at relatively random times.

1 Like

You can use a different pattern.

Have two functions, the outer one is not cached. Written on a phone so forgive the capitalisation.

def load_data():
    Update_timestamp = get_file_change_time(fname)
    Return cached_data_load(update_timestamp)

@st.cache(TTL=60*60)
Def cached_data_load(timestamp):
    df = ...
    Return of

The input argument changes when the timestamp changes, so it will not use a previously cached version. I put the TTL there still to hope that clears old cached data out. Not sure if there’s an LRU cache setting, that would be the preferred approach.

1 Like

This worked perfectly! Thanks!

Any way it could be improved to refresh automatically when the timestamp is different (which implies some kid of a monitor of the timestamp running all the time…)?

1 Like

Glad that worked for you :slight_smile: You might be able to do a kind of polling with a custom component to get the page to refresh, or maybe something funky with threads but I think it might go a little against how streamlit is setup.

Perhaps that’s a feature request, if maybe there was a component that had its own http endpoint you could manage this entirely outside and just post when the file had been updated.