For some reason from the moment I press on a select box, it takes a least 2 mins until the app is reacting.
As you can see in the log, all the “massive” calculating are cached.
I can’t figure out why it takes so long until the script hits the first cache key.
As I see it in the debug log from your first post, there are multiple calls to load_data, load_csv and load_csv_data which appear to manipulate some DataFrame. To help you more, we’d need to know the kind of Dataframes you are manipulating, in which order you are calling your different functions in your source code, how you are caching them and which function is running the longest which you could check by printing timestamps at the beginning and end of the function for example.
It may be that some columns in the Dataframe spend a lot of time being hashed by the cache system, or being hashed multiple times for no reason…which is why the hashing.py is the most time consuming in your 2nd debug log. But it’s hard to prove without proper knowledge of the code you are running
Would you be able to share a small reproducile example so we’re able to provide with better advice ?
Your Streamlit version, OS, Web browser used, Python version may also prove useful later on, but for now knowledge of what you are trying to achieve is more important for us !
it seems it takes about 30 seconds to generate the cache in memory:
Creating new mem_cache (key=d2f33ded8cdb4a109f84a184dc8127de, max_entries=inf, ttl=inf)
and another 30 sec to insert to it once my query is finished.
later on, once i want to use the same cache:
Cache key: 60b91bb6469e4ba8cac0dadbeecee70d-d2f33ded8cdb4a109f84a184dc8127de
Memory cache HIT: <class ‘pandas.core.frame.DataFrame’>
Cache hit: <function load_data at 0x11af193b0>
but after the cache hit, takes streamlit about 30 seconds to show the dataframe on screen.
that’s the function with the issues:
@st.cache
def load_data(sql: str) -> pd.DataFrame:
"""
wrapper around pd.read_sql_query for caching and getting a session bind
@param sql:
@return: pd.DataFrame
"""
with st.spinner('Loading Data...'):
df = pd.read_sql_query(sql=sql, con=Session.get_bind())
return df
usually, sql input str is sqlalchemy generated:
Query([Model]).statement = select * from a table with 1 row
so…
the app is running a lot faster, but i’m not getting cache hits while using the above function.
i also tested @st.cache(hash_funcs={sqlalchemy.sql.selectable.Select: hash, Engine: lambda _: None})
maybe something with the engibe hash?
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.