How to refresh cache when a file loaded from a url is updated?

lasticot · June 13, 2021, 1:47pm

Hello,
The csv file is updated daily and I would like the cache to refresh when it detects a change.
I can access the last update date in the headers but I don’t know how to use it with the hash_funcs argument. I’ve tried some variations of the following:

import requests

url = 'https://www.data.gouv.fr/fr/datasets/r/83cbbdb9-23cb-455e-8231-69fc25d58111'

class FileReference:
    def __init__(self, url):
        self.url = url

def hash_file_reference(url):
    r = requests.get(url)
    return r.headers['Date']

@st.cache(hash_funcs={FileReference: hash_file_reference})
def load_data():
    global url
    df = pd.read_csv(url)
    ....

Is it possible to use the hasher on a global variable ?
I’ve also tried to use the url as an argument of load_data() but to be honest I don’t really know what I’m doing.

Any help is appreciated.

lasticot · July 29, 2021, 9:17pm

Any help appreciated!

andfanilo · August 13, 2021, 8:36am

Salut @lasticot , welcome to the community!

I’ve had a quick look, and my first idea is to actually get rid of hash_funcs and put the date of update as argument of the method:

import pandas as pd
import requests
import streamlit as st

url = 'https://www.data.gouv.fr/fr/datasets/r/83cbbdb9-23cb-455e-8231-69fc25d58111'

r = requests.get(url)
latest_update = r.headers['Last-Modified']

@st.cache(suppress_st_warning=True)  # <-- the suppress warning you can remove when you remove the st.write('Nothing in cache')
def load_data(url, date):
    st.write(f"NOTHING IN CACHE FOR {url}/{date}") # <-- only to debug, should not appear if result is taken from cache instead of recomputed. Remove beforedeploying.
    return pd.read_csv(url)

st.dataframe(load_data(url, latest_update))

This way, if a pair url; latest_update --> DataFrame is computed, the resulting DataFrame will stay in cache for this pair of inputs until a new latest_update input is provided.

NB: Careful in your example, it seems the Date header doesn’t give the latest update date but rather the download date, you’ll need to use Last-Modified instead

If it’s a very long running app I would also suggest st.cache(max_entries=10) to remove the 10 older entries in cache.

Now if you want to continue using hash_funcs then

your hash_file_reference should take a FileReference instance as input
There should be a FileReference called somewhere in your load_data method, either as input argument or in the body, that when Streamlit gets to one it knows how to process it.

So here’s an alternative :

import pandas as pd
import requests
import streamlit as st

url = 'https://www.data.gouv.fr/fr/datasets/r/83cbbdb9-23cb-455e-8231-69fc25d58111'

class FileReference:
    def __init__(self, url):
        self.url = url

def hash_file_reference(f: FileReference):
    r = requests.get(f.url)
    return r.headers['Last-Modified']

@st.cache(hash_funcs={FileReference: hash_file_reference}, suppress_st_warning=True)
def load_data(file: FileReference):
    st.write(f"NOTHING IN CACHE FOR {file}")
    return pd.read_csv(file.url)

st.dataframe(load_data(FileReference(url)))

that way the cached function should run only if FileReference’s date header changes.

I prefer the first solution though, because you clearly define in the inputs of the cached method the pair of url/date you want to put into Streamlit cache I’d rather keep hash_func for processing complex objects like Matplotlib figures.

Hope this helps,
Fanilo

Charly_Wargnier · August 13, 2021, 9:53am

Looks fab @andfanilo! I’m bookmarking this for a try later!

Topic		Replies	Views
Using cache functionality + hashing Using Streamlit cache	5	534	December 27, 2023
Refresh cache when Panda data file changes Using Streamlit cache	5	7029	January 12, 2022
Cashing a function that reads different CSV's Using Streamlit cache	2	362	April 2, 2024
Refresh cache daily Using Streamlit cache	8	4508	August 15, 2022
Update data from dataframe every day Using Streamlit cache , pandas	2	1683	November 19, 2021

How to refresh cache when a file loaded from a url is updated?

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies