Excel file only cached after 2nd rerun

nicomunting · June 11, 2024, 8:14am

I created an app that has a load screen where a user can upload multiple Excel files. These Excel files can be big, so I applied caching. I noticed that the caching does not start working until the 2nd rerun, which surprised me.

I created a minimal working example of the app that shows the behavior:

import streamlit as st
import pandas as pd

st.set_page_config(
    page_title='Report',
    layout="wide"
)
st.title('Report')


@st.cache_data
def load_excel_files(files) -> list[pd.DataFrame]:
    dfs = []
    for file in files:
        df = pd.read_excel(file)
        dfs.append(df)

    return dfs


if "files_uploaded" not in st.session_state:
    st.session_state["files_uploaded"] = False

if not st.session_state["files_uploaded"]:
    st.session_state["files"] = st.file_uploader(
        "Excel files", type="xlsx", accept_multiple_files=True
    )

    submitted = st.button("Go", disabled=st.session_state["files"] == [])

    if submitted:
        st.session_state["files_uploaded"] = True
        st.rerun()
    else:
        st.stop()

dfs = load_excel_files(st.session_state["files"])

n_rows = st.slider("Number of rows", min_value=1, max_value=100, value=5)

for df in dfs:
    st.write(df.head(n_rows))

I am running the app locally using Python 3.9.6 and Streamlit 1.35.0. I know that this occurs on Streamlit versions as old as 1.11 and also for newer Python versions (e.g., 3.12).

You can upload any big Excel file and the app will show the first n rows as indicated by the slider. (I created an Excel file with 100 columns and 2500 rows of random numbers to have a big Excel file).
Moving the slider the first time results in a rerun of load_excel_files(). Any subsequent changes to the slider use the cached data. Why are the files not successfully cached after the first rerun?

nicomunting · June 11, 2024, 10:46am

I discovered a workaround for this problem. Replacing the decorator with @st.cache_data(hash_funcs={st.runtime.uploaded_file_manager.UploadedFile: lambda x: x.file_id}), to tell Streamlit to use the file ID as a hash, makes sure the caching works from the first run.

It still wonder why it only works from the 2nd run if I only add @st_cache_data without parameters as a decorator to the function.

dataprofessor · July 2, 2024, 8:23pm

Glad to hear you’ve found a workaround.

PIerre2 · August 31, 2024, 11:34am

this happens with every st.cache, and I’d like to know why and if possible if someone has a solution that works regardless of the function ? I have the same behavior when the data is actually cached after the 2nd rerun for some reason. Even using the custom hash function, it still only uses the cache data on the 2nd rerun. If anybody got the same issue, would love a solution

system · February 27, 2025, 11:34am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dataframe reloads despite using cache when changing value in select box Using Streamlit pandas	2	600	January 6, 2024
File upload does not clear cash Using Streamlit cache	2	1198	April 4, 2022
Issue with caching data uploaded via the file uploader Using Streamlit	5	5310	March 23, 2022
Why each function in this Streamlit code executes 3 times? Using Streamlit cache , pandas	7	4840	May 13, 2022
Can't understand why my app always reload Using Streamlit session-state , debugging	2	230	December 28, 2024

Excel file only cached after 2nd rerun

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies