Refresh cache when Panda data file changes

I’m reading.caching a dataframe from a .pkl file and refreshing every hour using the ttl option:

@st.cache
def load_data(ttl=60*60):
    df = pd.read_pickle('./results/alldata.pkl')
    return df

Instead of TTL, is it possible to clear/reload the cache upon a file change event(either file size, date, last modified?). I have a separate script collecting data and overwriting the .pkl file at relatively random times.

1 Like

You can use a different pattern.

Have two functions, the outer one is not cached. Written on a phone so forgive the capitalisation.

def load_data():
    Update_timestamp = get_file_change_time(fname)
    Return cached_data_load(update_timestamp)

@st.cache(TTL=60*60)
Def cached_data_load(timestamp):
    df = ...
    Return of

The input argument changes when the timestamp changes, so it will not use a previously cached version. I put the TTL there still to hope that clears old cached data out. Not sure if there’s an LRU cache setting, that would be the preferred approach.

1 Like

This worked perfectly! Thanks!

Any way it could be improved to refresh automatically when the timestamp is different (which implies some kid of a monitor of the timestamp running all the time…)?

1 Like

Glad that worked for you :slight_smile: You might be able to do a kind of polling with a custom component to get the page to refresh, or maybe something funky with threads but I think it might go a little against how streamlit is setup.

Perhaps that’s a feature request, if maybe there was a component that had its own http endpoint you could manage this entirely outside and just post when the file had been updated.

Hello Ian,

Thanks for your answer. I try to tested this code but the dataframe wasn’t update.

In my aplication I designed a detector based on FPGA that send the monitoring of a sensors trhough serial port and is proccessing and storage with a script in python. This script created a csv file that is updated each time that the FPGA send a monitoring frame.

I write a script to use streamlit to show the data of the monitorin and I push the option to load a old file or detect a new file and with your code detect that change (update) and update the dataframe and graphics.

I share the code that use for the select the source and load the data:
st.sidebar.title(“Settings”)

if not st.sidebar.checkbox(“Live Data”, True):
st.sidebar.markdown(“Choose the data file csv”)
folder_path = path_script = os.path.abspath(os.getcwd())
filenames = os.listdir(folder_path )
selected_filename = st.selectbox(‘Select a file’, filenames)
DATA_URL=os.path.join(folder_path + ‘\’ + selected_filename)

st.write('You selected `%s`' % DATA_URL)

else:
list_of_files = glob.glob(’*.csv’)
#print(list_of_files)
LASTEST_FILE = max(list_of_files, key=os.path.getctime)
#print(latest_file)
path_script = os.path.abspath(os.getcwd())
DATA_URL = path_script + ‘\’+ LASTEST_FILE

def load_data():
update_timestamp = time.ctime(os.path.getmtime(DATA_URL))
st.write(update_timestamp)
return cached_data_load(update_timestamp)

@st.cache(ttl=60)
def cached_data_load(timestamp):
data = pd.read_csv(DATA_URL)
return data

data = load_data()

#%%
st.markdown("### Rate vs Time")

f1 = alt.Chart(data).mark_circle().encode(
x=alt.X(‘Time’, axis=alt.Axis(title=“Time”)),
y=alt.Y(‘Rate’, axis=alt.Axis(title=“Rate [s^-1]”)),
color=‘Rate’
)
st.altair_chart(f1, use_container_width=True)

The dataframe is update only when I pushed a checkbox implemented on script

Could you help me please to understand why the app.py don’t update the dataframe and graph please?

Regards
Juan Carlos