Salut @lasticot , welcome to the community!
I’ve had a quick look, and my first idea is to actually get rid of hash_funcs
and put the date of update as argument of the method:
import pandas as pd
import requests
import streamlit as st
url = 'https://www.data.gouv.fr/fr/datasets/r/83cbbdb9-23cb-455e-8231-69fc25d58111'
r = requests.get(url)
latest_update = r.headers['Last-Modified']
@st.cache(suppress_st_warning=True) # <-- the suppress warning you can remove when you remove the st.write('Nothing in cache')
def load_data(url, date):
st.write(f"NOTHING IN CACHE FOR {url}/{date}") # <-- only to debug, should not appear if result is taken from cache instead of recomputed. Remove beforedeploying.
return pd.read_csv(url)
st.dataframe(load_data(url, latest_update))
This way, if a pair url; latest_update --> DataFrame
is computed, the resulting DataFrame will stay in cache for this pair of inputs until a new latest_update
input is provided.
NB: Careful in your example, it seems the Date
header doesn’t give the latest update date but rather the download date, you’ll need to use Last-Modified
instead
If it’s a very long running app I would also suggest st.cache(max_entries=10)
to remove the 10 older entries in cache.
Now if you want to continue using hash_funcs
then
- your
hash_file_reference
should take a FileReference
instance as input
- There should be a
FileReference
called somewhere in your load_data
method, either as input argument or in the body, that when Streamlit gets to one it knows how to process it.
So here’s an alternative :
import pandas as pd
import requests
import streamlit as st
url = 'https://www.data.gouv.fr/fr/datasets/r/83cbbdb9-23cb-455e-8231-69fc25d58111'
class FileReference:
def __init__(self, url):
self.url = url
def hash_file_reference(f: FileReference):
r = requests.get(f.url)
return r.headers['Last-Modified']
@st.cache(hash_funcs={FileReference: hash_file_reference}, suppress_st_warning=True)
def load_data(file: FileReference):
st.write(f"NOTHING IN CACHE FOR {file}")
return pd.read_csv(file.url)
st.dataframe(load_data(FileReference(url)))
that way the cached function should run only if FileReference
’s date header changes.
I prefer the first solution though, because you clearly define in the inputs of the cached method the pair of url/date you want to put into Streamlit cache I’d rather keep hash_func
for processing complex objects like Matplotlib figures.
Hope this helps,
Fanilo