I created an app that has a load screen where a user can upload multiple Excel files. These Excel files can be big, so I applied caching. I noticed that the caching does not start working until the 2nd rerun, which surprised me.
I created a minimal working example of the app that shows the behavior:
import streamlit as st
import pandas as pd
st.set_page_config(
page_title='Report',
layout="wide"
)
st.title('Report')
@st.cache_data
def load_excel_files(files) -> list[pd.DataFrame]:
dfs = []
for file in files:
df = pd.read_excel(file)
dfs.append(df)
return dfs
if "files_uploaded" not in st.session_state:
st.session_state["files_uploaded"] = False
if not st.session_state["files_uploaded"]:
st.session_state["files"] = st.file_uploader(
"Excel files", type="xlsx", accept_multiple_files=True
)
submitted = st.button("Go", disabled=st.session_state["files"] == [])
if submitted:
st.session_state["files_uploaded"] = True
st.rerun()
else:
st.stop()
dfs = load_excel_files(st.session_state["files"])
n_rows = st.slider("Number of rows", min_value=1, max_value=100, value=5)
for df in dfs:
st.write(df.head(n_rows))
I am running the app locally using Python 3.9.6 and Streamlit 1.35.0. I know that this occurs on Streamlit versions as old as 1.11 and also for newer Python versions (e.g., 3.12).
You can upload any big Excel file and the app will show the first n rows as indicated by the slider. (I created an Excel file with 100 columns and 2500 rows of random numbers to have a big Excel file).
Moving the slider the first time results in a rerun of load_excel_files()
. Any subsequent changes to the slider use the cached data. Why are the files not successfully cached after the first rerun?