We want to deprecate st.cache! …and need your input ❤️

Thanks, then I have no idea where

cannot import name 'caching' from 'streamlit' (/home/appuser/venv/lib/python3.7/site-packages/streamlit/init.py)

came from… (and why it gave an error in some scripts and not in other scripts)

@rcsmit You wouldn’t import it – that’s why the error is happening. You would just use the method without importing it (just do import streamlit as you usually would)

2 Likes

I have found the singleton and memorized way more intuitive on how to use. like it also as for our team we have some general library/package code that helps users make/use common resources like database connections, etc (which make use of singleton) and users can just worry about for their dashboards dealing with the simpler caching via memorize.

I think one thing in docs that could be even clearer is at what level caching is namely comments like in Expiring the experimental_singleton cache at regular intervals - #6 by ksdaftari could be just made even clearer on that singleton/memorize global caches (not specific to user like session_state). unless I am misunderstanding docs here (I say this as know have to train some of people making dashboards to be careful when actually cache especially if output that could be different for users, eg would be if hitting api that could have differing authorization depending on user).

1 Like
  • Do you like st.experimental_memo and st.experimental_singleton?
    yes, personally I only use the memoization part of st.cache, so having the memoization and the singleton-pattern parts in separate functions is very nice, it’s way faster. I also love that one can finally clear the cache in code.
  • What do you dislike?
    nothing so far
  • Are there any killer features in st.cache that your apps can’t live without?
    no
  • Do you understand when to use memo and when to use singleton or is this confusing?
    I understood it after reading the documentation once, to me it was clear, and I started using streamlit a month ago!

I hope that st.cache gets deprecated, I still don’t understand why st.cache is still accessible in the current version if there is no way to clear it in the code.

1 Like

I think this is a very good practice, but we need to carefully decide whether to discard st Cache, because frequent changes to mainstream functions will make people untrustworthy, and the cost of maintenance will increase accordingly. Thank you

From the docs:

Additionally, you can use st.experimental_memo.clear() and st.experimental_singleton.clear() to clear all memo and singleton caches, respectively.

Use case: I have many data gathering functions, and when I clear caches I don’t need ALL of them refreshed.

I am using memo_decorated_function.clear() which seems to work for that specific function’s cache alone. Is that correct?

If not, could we have a unique key for each cache decorator, which would then allow each cache to be cleared individually. The key name could default to the name of the decorated function, or overridden with a key parameter.

Thanks,
Arvindra

1 Like
  • yes, yes.
  • Functionally, the current code solves the most important cases, but with some pains. A function init(str,str) is marked as singleton in my app, but streamlit runs it 6 times before crashing due to OOM. 2 tabs were open and running init(). It looked frozen, so I refresh. Init() finishes and loads the page. Opening a new tab calls init again(). You get the idea. I look forward to the next version. Thank you all.
  • I’ve forgotten all about st.cache, however the name was far better and the organization is now confusing. I recommend asking users about the naming that will follow. st.cache, st.memo, st.singleton, st.session_state. The concern is that users will struggle to remember nuances and the names are not distinctly clear what they do. The challenge that you will run into is that weak programmers will want to use Streamlit. I don’t mean the kids. Those who understand state machines, multi-processing, and multi-tenancy will breeze through the docs, but imagine a data analyst, business intelligence, marketing, and sales dept of a company. They have math skills more than architecture design. You will have to educate them in the way of if-then to provide the basics. This suggestion is also me asking for help from you. Over the last 2 years, I presented many demos and prototypes using streamlit. People always want to run it locally. For every 20 technical people, there will always be someone asking for help. The suggestions is to remove the interface for memo and singleton. Leave two interfaces to the same backend variables, a wrapper and a normal function or dict-like. In both cases, have variables to indicate rules (e.g. the hashable key =tuple(global/local, user UUID, func name, args), enable serialize result, compress serialized result, FIFO length, is global or local key, max RAM, max disk, purge or raise error on OOM, expiration timestamp, …). With the rules explicitly set, validation should become easy. You could then give better error messages for each scenario. Then, no one needs to know how it works. In my proposal, users simply communicate the expectation through variables, and your code would infer the best way to get there. Future versions would maintain the same interface while the backend gets upgraded. Raise an exception if the request doesn’t make sense. TLDR: Suggest changing API to
st.cache(..., serialize_result=True, compression='DEFLATE', queue='FIFO', history_length='inf',
 is_local=False, max_ram='inf', min_free_ram='1g', max_disk='1g', raise_on_oom=False, 
ts_expiration=time.now()+'1d', can_purge=True, verbose=2) and st.session/st.globals as dict-like 
st.globals[key]. 

Delete all the others marked as experimental.

  • What’s missing? resource locking/control (because cached init gets called twice sometimes, when it should lock and wait), security/privacy (option to intercept calls to Pickle load/save with a callable to change encoding), option to use cookies as local key-val storage, storage config (enable replacing the backend KV storage mechanism for transport, backup, scale, or performance), write protection (it keeps re-running the cached function that takes too long to run). As implied already, adding purge mechanisms would be great. Most important is clarity on what I want vs what I’m doing. I understand how your library functions. What I need most is a simple way to control the hashable key under @st.cache to specify global/local and serialize or not.

I really like this interface, but the functionality was the problem. You isolated the problems into modules. I look forward to all of these unified.

st.cache(func=None, persist=False, allow_output_mutation=False, show_spinner=True, 
suppress_st_warning=False, hash_funcs=None, max_entries=None, ttl=None)

Does this mean that st.cache won’t work anymore? Or will it be hidden in the documentation?

2 Likes

The issue with experimental_memo not hashing (hashable) class instance arguments seems to not have been fixed yet. It would be good if that was fixed before cache was deprecated.

Little update…

Thanks for the feedback everyone! :heart: Our main takeaway from here and other talks with users was:

  1. Splitting caching into two separate decorators is the right way to go!
  2. The names memo and singleton are too difficult to understand for a lot of users.

So the solution we’re now leaning towards is:

  • Rename st.experimental_memo to st.cache_data. This command should be used to cache any data objects, e.g. pandas dataframes, numpy arrays, str/int/float, or lists and dicts containing such data objects. Example use cases are dataframe transformations, API queries, ML inference, etc. Behavior will stay the same as for st.experimental_memo, i.e. you always get a fresh copy of the return object at every rerun. This is also the default command you should use in 90% of all cases.
  • Rename st.experimental_singleton to st.cache_resource. This command should be used to cache any global resources that will be shared across all reruns and sessions, e.g. database connections or ML models. For example if you’re initializing a connection or loading an ML model from disk. We’re also working on a more specific st.connection command, which will allow you to connect to databases in a single line of code and should abstract away caching and similar details (see our roadmap blog post). We’re also thinking if we can do something similar for initializing ML models (e.g. an st.model – comment if you have ideas!). In the long run, we see st.cache_resource as an advanced command that most users won’t need to touch.
  • Do a much better job in the docs to explain these two commands, what their differences are, and in which situation you should use what.

We are now implementing the new commands and a few other adjustments. We want to release them in December/January and will start the deprecation of st.cache then. We’re doing the deprecation in a very very careful way! Specifically, we won’t remove st.cache at least until 2.0 to prevent breakage and we’ll give a lot of guidance (both in the app and in the docs) on how to move over to the new commands. In most situations, it should just be a small name change and you’re good to go.

Happy to hear any feedback that y’all have!

4 Likes

I especially love the new names. I think those are much more intuitive.

“”“>>>will let you connect to external databases and APIs with a single line of code. >>>st.database will launch a small database alongside every Streamlit app, so >>>you can permanently store data without any setup.”“”

Please think or give a thought of using duckdb or redis cache for the small db of streamlit… It would be helpful in many and a lot of other scenarios…
A friendly thought/suggestion…
Thanks
Sai

1 Like

I agree, the new names make so much more sense. Excited!

1 Like

I have not used st.cache in forever. I do regularly use st.experimental_singleton and st.experimental_memo. I think they are working fine and greatly improving the user experience in general.

If I could change anything, I’d just add an expiration parameter to st.experimental_singleton.

As far as understanding / documentation goes, I think it is pretty much fine but the scope of it could be stated more explicitly (whether the caching works across sessions and so on).

Regarding the proposed st.connection function it is somewhat hard for me to imagine how would it interact with the vast amount of specialized database drivers, ORMs and so on that’s out there.

Hi @ennui :wave:

The ttl and max_entries expiration parameters have recently been added to st.experimental_singleton with:

They should be available available in the next 1.16.0 release.

3 Likes

Hi @jrieke -

I’ve seen a few users looking for a user-specific cache (across-sessions) and session-specific cache. They’re doing hacks to accomplish this, and if they’re building multitenant apps, risking leaking data/it’s quite unsafe.

Example threads:

plus @whitphx 's post above We want to deprecate st.cache! …and need your input ❤️ - #18 by whitphx.

The solutions out there right now:

  • inject a hacked up session_id into cached/memoized methods (there are gists out there for this; they’ve broken after some releases)
  • inject a user_id, e.g. experimental_user - not great cuz this isn’t available for public apps or private third party deployments
  • use session_state - the ergonomics are poor/unsafe, and need to rig your own TTL/etc.:
some_df = get_data(...)
st.session_state['my_df'] = some_df
plot(st.session_state['my_df']) # BETTER NOT USE some_def!
other_df = transform(st.session_state['my_df'])...

vs a better alternative IMO:

some_df = get_data(...)
plot(some_df)
...
@st.memo(per_user=True)
def get_data(...)
...

Any plans for something that supports more ergonomic user and/or session-specific cache-ing?

All you need is your cached functions to take a user_id or a session_id parameter. In order to have a user_id you need authentication, there are already several ways to do it and everybody can implement their own.

For a session_id, here is a simple implementation that won’t break easily:

from uuid import uuid4
import streamlit as st

def get_session_id():
    SESSION_ID_KEY = "#SESSION-ID#"
    
    if SESSION_ID_KEY not in st.session_state:
        st.session_state[SESSION_ID_KEY] = uuid4()
    return st.session_state[SESSION_ID_KEY]

Do not forget to put a limit on the cache size so that it does not grow unbounded.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.