Excel file only cached after 2nd rerun

I created an app that has a load screen where a user can upload multiple Excel files. These Excel files can be big, so I applied caching. I noticed that the caching does not start working until the 2nd rerun, which surprised me.

I created a minimal working example of the app that shows the behavior:

import streamlit as st
import pandas as pd

st.set_page_config(
    page_title='Report',
    layout="wide"
)
st.title('Report')


@st.cache_data
def load_excel_files(files) -> list[pd.DataFrame]:
    dfs = []
    for file in files:
        df = pd.read_excel(file)
        dfs.append(df)

    return dfs


if "files_uploaded" not in st.session_state:
    st.session_state["files_uploaded"] = False

if not st.session_state["files_uploaded"]:
    st.session_state["files"] = st.file_uploader(
        "Excel files", type="xlsx", accept_multiple_files=True
    )

    submitted = st.button("Go", disabled=st.session_state["files"] == [])

    if submitted:
        st.session_state["files_uploaded"] = True
        st.rerun()
    else:
        st.stop()

dfs = load_excel_files(st.session_state["files"])

n_rows = st.slider("Number of rows", min_value=1, max_value=100, value=5)

for df in dfs:
    st.write(df.head(n_rows))

I am running the app locally using Python 3.9.6 and Streamlit 1.35.0. I know that this occurs on Streamlit versions as old as 1.11 and also for newer Python versions (e.g., 3.12).

You can upload any big Excel file and the app will show the first n rows as indicated by the slider. (I created an Excel file with 100 columns and 2500 rows of random numbers to have a big Excel file).
Moving the slider the first time results in a rerun of load_excel_files(). Any subsequent changes to the slider use the cached data. Why are the files not successfully cached after the first rerun?

I discovered a workaround for this problem. Replacing the decorator with @st.cache_data(hash_funcs={st.runtime.uploaded_file_manager.UploadedFile: lambda x: x.file_id}), to tell Streamlit to use the file ID as a hash, makes sure the caching works from the first run.

It still wonder why it only works from the 2nd run if I only add @st_cache_data without parameters as a decorator to the function.

Glad to hear you’ve found a workaround.

this happens with every st.cache, and I’d like to know why and if possible if someone has a solution that works regardless of the function ? I have the same behavior when the data is actually cached after the 2nd rerun for some reason. Even using the custom hash function, it still only uses the cache data on the 2nd rerun. If anybody got the same issue, would love a solution :slight_smile: