For some reason from the moment I press on a select box, it takes a least 2 mins until the app is reacting.
As you can see in the log, all the “massive” calculating are cached.
I can’t figure out why it takes so long until the script hits the first cache key.
As I see it in the debug log from your first post, there are multiple calls to load_data, load_csv and load_csv_data which appear to manipulate some DataFrame. To help you more, we’d need to know the kind of Dataframes you are manipulating, in which order you are calling your different functions in your source code, how you are caching them and which function is running the longest which you could check by printing timestamps at the beginning and end of the function for example.
It may be that some columns in the Dataframe spend a lot of time being hashed by the cache system, or being hashed multiple times for no reason…which is why the hashing.py is the most time consuming in your 2nd debug log. But it’s hard to prove without proper knowledge of the code you are running
Would you be able to share a small reproducile example so we’re able to provide with better advice ?
Your Streamlit version, OS, Web browser used, Python version may also prove useful later on, but for now knowledge of what you are trying to achieve is more important for us !
it seems it takes about 30 seconds to generate the cache in memory:
Creating new mem_cache (key=d2f33ded8cdb4a109f84a184dc8127de, max_entries=inf, ttl=inf)
and another 30 sec to insert to it once my query is finished.
later on, once i want to use the same cache:
Cache key: 60b91bb6469e4ba8cac0dadbeecee70d-d2f33ded8cdb4a109f84a184dc8127de
Memory cache HIT: <class ‘pandas.core.frame.DataFrame’>
Cache hit: <function load_data at 0x11af193b0>
but after the cache hit, takes streamlit about 30 seconds to show the dataframe on screen.
that’s the function with the issues:
@st.cache
def load_data(sql: str) -> pd.DataFrame:
"""
wrapper around pd.read_sql_query for caching and getting a session bind
@param sql:
@return: pd.DataFrame
"""
with st.spinner('Loading Data...'):
df = pd.read_sql_query(sql=sql, con=Session.get_bind())
return df
usually, sql input str is sqlalchemy generated:
Query([Model]).statement = select * from a table with 1 row
so…
the app is running a lot faster, but i’m not getting cache hits while using the above function.
i also tested @st.cache(hash_funcs={sqlalchemy.sql.selectable.Select: hash, Engine: lambda _: None})
maybe something with the engibe hash?