Refresh cache when Panda data file changes

gretzteam · August 11, 2020, 9:16am

I’m reading.caching a dataframe from a .pkl file and refreshing every hour using the ttl option:

@st.cache
def load_data(ttl=60*60):
    df = pd.read_pickle('./results/alldata.pkl')
    return df

Instead of TTL, is it possible to clear/reload the cache upon a file change event(either file size, date, last modified?). I have a separate script collecting data and overwriting the .pkl file at relatively random times.

Ian_Calvert · August 11, 2020, 10:50am

You can use a different pattern.

Have two functions, the outer one is not cached. Written on a phone so forgive the capitalisation.

def load_data():
    Update_timestamp = get_file_change_time(fname)
    Return cached_data_load(update_timestamp)

@st.cache(TTL=60*60)
Def cached_data_load(timestamp):
    df = ...
    Return of

The input argument changes when the timestamp changes, so it will not use a previously cached version. I put the TTL there still to hope that clears old cached data out. Not sure if there’s an LRU cache setting, that would be the preferred approach.

gretzteam · August 11, 2020, 11:08am

This worked perfectly! Thanks!

Any way it could be improved to refresh automatically when the timestamp is different (which implies some kid of a monitor of the timestamp running all the time…)?

Ian_Calvert · August 11, 2020, 12:28pm

Glad that worked for you You might be able to do a kind of polling with a custom component to get the page to refresh, or maybe something funky with threads but I think it might go a little against how streamlit is setup.

Perhaps that’s a feature request, if maybe there was a component that had its own http endpoint you could manage this entirely outside and just post when the file had been updated.

jsbalanzar · October 26, 2020, 5:52am

Hello Ian,

Thanks for your answer. I try to tested this code but the dataframe wasn’t update.

In my aplication I designed a detector based on FPGA that send the monitoring of a sensors trhough serial port and is proccessing and storage with a script in python. This script created a csv file that is updated each time that the FPGA send a monitoring frame.

I write a script to use streamlit to show the data of the monitorin and I push the option to load a old file or detect a new file and with your code detect that change (update) and update the dataframe and graphics.

I share the code that use for the select the source and load the data:
st.sidebar.title(“Settings”)

if not st.sidebar.checkbox(“Live Data”, True):
st.sidebar.markdown(“Choose the data file csv”)
folder_path = path_script = os.path.abspath(os.getcwd())
filenames = os.listdir(folder_path )
selected_filename = st.selectbox(‘Select a file’, filenames)
DATA_URL=os.path.join(folder_path + ‘\’ + selected_filename)

st.write('You selected `%s`' % DATA_URL)

else:
list_of_files = glob.glob(’*.csv’)
#print(list_of_files)
LASTEST_FILE = max(list_of_files, key=os.path.getctime)
#print(latest_file)
path_script = os.path.abspath(os.getcwd())
DATA_URL = path_script + ‘\’+ LASTEST_FILE

def load_data():
update_timestamp = time.ctime(os.path.getmtime(DATA_URL))
st.write(update_timestamp)
return cached_data_load(update_timestamp)

@st.cache(ttl=60)
def cached_data_load(timestamp):
data = pd.read_csv(DATA_URL)
return data

data = load_data()

#%%
st.markdown("### Rate vs Time")

f1 = alt.Chart(data).mark_circle().encode(
x=alt.X(‘Time’, axis=alt.Axis(title=“Time”)),
y=alt.Y(‘Rate’, axis=alt.Axis(title=“Rate [s^-1]”)),
color=‘Rate’
)
st.altair_chart(f1, use_container_width=True)

The dataframe is update only when I pushed a checkbox implemented on script

Could you help me please to understand why the app.py don’t update the dataframe and graph please?

Regards
Juan Carlos

Topic		Replies	Views
Refresh cache daily Using Streamlit cache	8	4525	August 15, 2022
Update data from dataframe every day Using Streamlit cache , pandas	2	1684	November 19, 2021
Dataframe reloads despite using cache when changing value in select box Using Streamlit pandas	2	606	January 6, 2024
How to refresh cache when a file loaded from a url is updated? Using Streamlit cache	3	1763	August 13, 2021
Caching Seems to not work. Function is re-run every time a slider is changed Using Streamlit cache , pandas	3	1042	May 13, 2022

Refresh cache when Panda data file changes

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies