I have the following structure (note that DatabaseRetrival
is just a dataclass to wrap the result as a neat type):
def put_file_loader_or_database_loader() -> Optional[
Union[UploadedFile, DatabaseRetrieval]
]:
if loading_method == "CSV file":
source = st.file_uploader(...)
elif loading_method == "from stream":
df = pd.read_sql(...)
source = DatabaseRetrieval(df)
return source
And then it enters a cache function to unpack the source:
def load_data(source: Union[UploadedFile, DatabaseRetrival]):
if isinstance(DatabaseRetrieval, source):
df = _load_from_database_retrival()
elif isinstance(UploadedFile, source):
df = _load_from_uploaded_file()
....
Where the caching happens in the following:
@st.cache(allow_output_mutation=False, show_spinner=False, suppress_st_warning=True)
def _load_from_database_retrival() -> pd.DataFrame:
....
@st.cache(allow_output_mutation=False, show_spinner=False, suppress_st_warning=True)
def _load_from_uploaded_file() -> pd.DataFrame:
....
It works fine with CSVs, and the first function clocks in at 0.00 ms on every update to the controls. However, for databases it re-queries every single time. Where am I placing my caches wrong?