@rcsmit We haven’t started deprecating it. You don’t need to import caching separately from Streamlit – it’s all in the same library
Thanks, then I have no idea where
cannot import name 'caching' from 'streamlit' (/home/appuser/venv/lib/python3.7/site-packages/streamlit/init.py)
came from… (and why it gave an error in some scripts and not in other scripts)
@rcsmit You wouldn’t import it – that’s why the error is happening. You would just use the method without importing it (just do import streamlit
as you usually would)
I have found the singleton and memorized way more intuitive on how to use. like it also as for our team we have some general library/package code that helps users make/use common resources like database connections, etc (which make use of singleton) and users can just worry about for their dashboards dealing with the simpler caching via memorize.
I think one thing in docs that could be even clearer is at what level caching is namely comments like in Expiring the experimental_singleton cache at regular intervals - #6 by ksdaftari could be just made even clearer on that singleton/memorize global caches (not specific to user like session_state). unless I am misunderstanding docs here (I say this as know have to train some of people making dashboards to be careful when actually cache especially if output that could be different for users, eg would be if hitting api that could have differing authorization depending on user).
- Do you like
st.experimental_memo
andst.experimental_singleton
?
yes, personally I only use the memoization part ofst.cache
, so having the memoization and the singleton-pattern parts in separate functions is very nice, it’s way faster. I also love that one can finally clear the cache in code. - What do you dislike?
nothing so far - Are there any killer features in
st.cache
that your apps can’t live without?
no - Do you understand when to use
memo
and when to usesingleton
or is this confusing?
I understood it after reading the documentation once, to me it was clear, and I started using streamlit a month ago!
I hope that st.cache
gets deprecated, I still don’t understand why st.cache
is still accessible in the current version if there is no way to clear it in the code.
I think this is a very good practice, but we need to carefully decide whether to discard st Cache, because frequent changes to mainstream functions will make people untrustworthy, and the cost of maintenance will increase accordingly. Thank you
From the docs:
Additionally, you can use
st.experimental_memo.clear()
andst.experimental_singleton.clear()
to clear all memo and singleton caches, respectively.
Use case: I have many data gathering functions, and when I clear caches I don’t need ALL of them refreshed.
I am using memo_decorated_function.clear()
which seems to work for that specific function’s cache alone. Is that correct?
If not, could we have a unique key for each cache decorator, which would then allow each cache to be cleared individually. The key name could default to the name of the decorated function, or overridden with a key
parameter.
Thanks,
Arvindra
- yes, yes.
- Functionally, the current code solves the most important cases, but with some pains. A function
init(str,str)
is marked as singleton in my app, but streamlit runs it 6 times before crashing due to OOM. 2 tabs were open and running init(). It looked frozen, so I refresh. Init() finishes and loads the page. Opening a new tab calls init again(). You get the idea. I look forward to the next version. Thank you all. - I’ve forgotten all about st.cache, however the name was far better and the organization is now confusing. I recommend asking users about the naming that will follow. st.cache, st.memo, st.singleton, st.session_state. The concern is that users will struggle to remember nuances and the names are not distinctly clear what they do. The challenge that you will run into is that weak programmers will want to use Streamlit. I don’t mean the kids. Those who understand state machines, multi-processing, and multi-tenancy will breeze through the docs, but imagine a data analyst, business intelligence, marketing, and sales dept of a company. They have math skills more than architecture design. You will have to educate them in the way of if-then to provide the basics. This suggestion is also me asking for help from you. Over the last 2 years, I presented many demos and prototypes using streamlit. People always want to run it locally. For every 20 technical people, there will always be someone asking for help. The suggestions is to remove the interface for memo and singleton. Leave two interfaces to the same backend variables, a wrapper and a normal function or dict-like. In both cases, have variables to indicate rules (e.g. the hashable key =tuple(global/local, user UUID, func name, args), enable serialize result, compress serialized result, FIFO length, is global or local key, max RAM, max disk, purge or raise error on OOM, expiration timestamp, …). With the rules explicitly set, validation should become easy. You could then give better error messages for each scenario. Then, no one needs to know how it works. In my proposal, users simply communicate the expectation through variables, and your code would infer the best way to get there. Future versions would maintain the same interface while the backend gets upgraded. Raise an exception if the request doesn’t make sense. TLDR: Suggest changing API to
st.cache(..., serialize_result=True, compression='DEFLATE', queue='FIFO', history_length='inf',
is_local=False, max_ram='inf', min_free_ram='1g', max_disk='1g', raise_on_oom=False,
ts_expiration=time.now()+'1d', can_purge=True, verbose=2) and st.session/st.globals as dict-like
st.globals[key].
Delete all the others marked as experimental.
- What’s missing? resource locking/control (because cached init gets called twice sometimes, when it should lock and wait), security/privacy (option to intercept calls to Pickle load/save with a callable to change encoding), option to use cookies as local key-val storage, storage config (enable replacing the backend KV storage mechanism for transport, backup, scale, or performance), write protection (it keeps re-running the cached function that takes too long to run). As implied already, adding purge mechanisms would be great. Most important is clarity on what I want vs what I’m doing. I understand how your library functions. What I need most is a simple way to control the hashable key under
@st.cache
to specify global/local and serialize or not.
I really like this interface, but the functionality was the problem. You isolated the problems into modules. I look forward to all of these unified.
st.cache(func=None, persist=False, allow_output_mutation=False, show_spinner=True,
suppress_st_warning=False, hash_funcs=None, max_entries=None, ttl=None)
Does this mean that st.cache won’t work anymore? Or will it be hidden in the documentation?
The issue with experimental_memo not hashing (hashable) class instance arguments seems to not have been fixed yet. It would be good if that was fixed before cache was deprecated.
Little update…
Thanks for the feedback everyone! Our main takeaway from here and other talks with users was:
- Splitting caching into two separate decorators is the right way to go!
- The names
memo
andsingleton
are too difficult to understand for a lot of users.
So the solution we’re now leaning towards is:
- Rename
st.experimental_memo
tost.cache_data
. This command should be used to cache any data objects, e.g. pandas dataframes, numpy arrays, str/int/float, or lists and dicts containing such data objects. Example use cases are dataframe transformations, API queries, ML inference, etc. Behavior will stay the same as forst.experimental_memo
, i.e. you always get a fresh copy of the return object at every rerun. This is also the default command you should use in 90% of all cases. - Rename
st.experimental_singleton
tost.cache_resource
. This command should be used to cache any global resources that will be shared across all reruns and sessions, e.g. database connections or ML models. For example if you’re initializing a connection or loading an ML model from disk. We’re also working on a more specificst.connection
command, which will allow you to connect to databases in a single line of code and should abstract away caching and similar details (see our roadmap blog post). We’re also thinking if we can do something similar for initializing ML models (e.g. anst.model
– comment if you have ideas!). In the long run, we seest.cache_resource
as an advanced command that most users won’t need to touch. - Do a much better job in the docs to explain these two commands, what their differences are, and in which situation you should use what.
We are now implementing the new commands and a few other adjustments. We want to release them in December/January and will start the deprecation of st.cache
then. We’re doing the deprecation in a very very careful way! Specifically, we won’t remove st.cache
at least until 2.0 to prevent breakage and we’ll give a lot of guidance (both in the app and in the docs) on how to move over to the new commands. In most situations, it should just be a small name change and you’re good to go.
Happy to hear any feedback that y’all have!
I especially love the new names. I think those are much more intuitive.
“”“>>>will let you connect to external databases and APIs with a single line of code. >>>st.database
will launch a small database alongside every Streamlit app, so >>>you can permanently store data without any setup.”“”
Please think or give a thought of using duckdb or redis cache for the small db of streamlit… It would be helpful in many and a lot of other scenarios…
A friendly thought/suggestion…
Thanks
Sai
I agree, the new names make so much more sense. Excited!
I have not used st.cache
in forever. I do regularly use st.experimental_singleton
and st.experimental_memo
. I think they are working fine and greatly improving the user experience in general.
If I could change anything, I’d just add an expiration parameter to st.experimental_singleton
.
As far as understanding / documentation goes, I think it is pretty much fine but the scope of it could be stated more explicitly (whether the caching works across sessions and so on).
Regarding the proposed st.connection
function it is somewhat hard for me to imagine how would it interact with the vast amount of specialized database drivers, ORMs and so on that’s out there.
Hi @ennui
The ttl
and max_entries
expiration parameters have recently been added to st.experimental_singleton
with:
They should be available available in the next 1.16.0 release.
Hi @jrieke -
I’ve seen a few users looking for a user-specific cache (across-sessions) and session-specific cache. They’re doing hacks to accomplish this, and if they’re building multitenant apps, risking leaking data/it’s quite unsafe.
Example threads:
plus @whitphx 's post above We want to deprecate st.cache! …and need your input ❤️ - #18 by whitphx.
The solutions out there right now:
- inject a hacked up
session_id
into cached/memoized methods (there are gists out there for this; they’ve broken after some releases) - inject a user_id, e.g.
experimental_user
- not great cuz this isn’t available for public apps or private third party deployments - use
session_state
- the ergonomics are poor/unsafe, and need to rig your own TTL/etc.:
some_df = get_data(...)
st.session_state['my_df'] = some_df
plot(st.session_state['my_df']) # BETTER NOT USE some_def!
other_df = transform(st.session_state['my_df'])...
vs a better alternative IMO:
some_df = get_data(...)
plot(some_df)
...
@st.memo(per_user=True)
def get_data(...)
...
Any plans for something that supports more ergonomic user and/or session-specific cache-ing?
All you need is your cached functions to take a user_id
or a session_id
parameter. In order to have a user_id
you need authentication, there are already several ways to do it and everybody can implement their own.
For a session_id
, here is a simple implementation that won’t break easily:
from uuid import uuid4
import streamlit as st
def get_session_id():
SESSION_ID_KEY = "#SESSION-ID#"
if SESSION_ID_KEY not in st.session_state:
st.session_state[SESSION_ID_KEY] = uuid4()
return st.session_state[SESSION_ID_KEY]
Do not forget to put a limit on the cache size so that it does not grow unbounded.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.