Caching doesn't work with databases?

I have the following structure (note that DatabaseRetrival is just a dataclass to wrap the result as a neat type):

def put_file_loader_or_database_loader() -> Optional[
    Union[UploadedFile, DatabaseRetrieval]
]:
    if loading_method == "CSV file":
        source = st.file_uploader(...)        

    elif loading_method == "from stream":
        df = pd.read_sql(...)

        source = DatabaseRetrieval(df)
    
    return source

And then it enters a cache function to unpack the source:

def load_data(source: Union[UploadedFile, DatabaseRetrival]):
    if isinstance(DatabaseRetrieval, source):
        df = _load_from_database_retrival()
    elif isinstance(UploadedFile, source):
        df = _load_from_uploaded_file()

    ....

Where the caching happens in the following:

@st.cache(allow_output_mutation=False, show_spinner=False, suppress_st_warning=True)
def _load_from_database_retrival() -> pd.DataFrame:
    ....

@st.cache(allow_output_mutation=False, show_spinner=False, suppress_st_warning=True)
def _load_from_uploaded_file() -> pd.DataFrame:
    ....

It works fine with CSVs, and the first function clocks in at 0.00 ms on every update to the controls. However, for databases it re-queries every single time. Where am I placing my caches wrong?

Hi @komodovaran :sunglasses: you can check more information about @st.cache at the link below. If think this decorator only works for function e.g:

import time

@st.cache  # 👈 Added this
def expensive_computation(a, b):
    time.sleep(2)  # This makes the function take 2s to run
    return a * b

a = 2
b = 21
res = expensive_computation(a, b)

st.write("Result:", res)

https://docs.streamlit.io/en/stable/caching.html#improve-app-performance

Hi @komodovaran -

st.cache definitely works with databases, but I suspect your Union typing is confusing the issue. Instead of doing that, I would do a simple If/else, to make it more obvious to the downstream code what your function is actually doing.

Best,
Randy

1 Like