We want to deprecate st.cache! …and need your input ❤️

I blindly use experimental_memo and it works perfectly for me. I never quite understood what the other one does but I do not do any machine learning so probably do not need it anyway. Great work.

Would be nice to have a way to shut a given cache off and measure whether it helps or not. I use it indiscriminately and the app is fast so I assume it generally helps.

3 Likes

Similar to We want to deprecate st.cache! …and need your input ❤️ - #6 by gagangoku,
my concern is the lack of unpicklable object support.

My original problem was posted in the previous thread about the new cache primitives:

though this has been solved by using st.session_state instead.
In my original case, my intention was to create a session-specific cache, and the session state was exactly what is for it.
So currently I don’t have specific problems, but I would like just to say that I’m afraid if there are some edge cases where users want to memoize unpicklable objects.

And I recommend to state that st.session_state can also be an alternative of st.cache for some specific cases in the migration guide.

3 Likes

You already started deprecating it? from streamlit import caching gives on some scripts errors, and on others it doesnt?

@rcsmit We haven’t started deprecating it. You don’t need to import caching separately from Streamlit – it’s all in the same library

Thanks, then I have no idea where

cannot import name 'caching' from 'streamlit' (/home/appuser/venv/lib/python3.7/site-packages/streamlit/init.py)

came from… (and why it gave an error in some scripts and not in other scripts)

@rcsmit You wouldn’t import it – that’s why the error is happening. You would just use the method without importing it (just do import streamlit as you usually would)

2 Likes

I have found the singleton and memorized way more intuitive on how to use. like it also as for our team we have some general library/package code that helps users make/use common resources like database connections, etc (which make use of singleton) and users can just worry about for their dashboards dealing with the simpler caching via memorize.

I think one thing in docs that could be even clearer is at what level caching is namely comments like in Expiring the experimental_singleton cache at regular intervals - #6 by ksdaftari could be just made even clearer on that singleton/memorize global caches (not specific to user like session_state). unless I am misunderstanding docs here (I say this as know have to train some of people making dashboards to be careful when actually cache especially if output that could be different for users, eg would be if hitting api that could have differing authorization depending on user).

1 Like
  • Do you like st.experimental_memo and st.experimental_singleton?
    yes, personally I only use the memoization part of st.cache, so having the memoization and the singleton-pattern parts in separate functions is very nice, it’s way faster. I also love that one can finally clear the cache in code.
  • What do you dislike?
    nothing so far
  • Are there any killer features in st.cache that your apps can’t live without?
    no
  • Do you understand when to use memo and when to use singleton or is this confusing?
    I understood it after reading the documentation once, to me it was clear, and I started using streamlit a month ago!

I hope that st.cache gets deprecated, I still don’t understand why st.cache is still accessible in the current version if there is no way to clear it in the code.

1 Like

I think this is a very good practice, but we need to carefully decide whether to discard st Cache, because frequent changes to mainstream functions will make people untrustworthy, and the cost of maintenance will increase accordingly. Thank you

From the docs:

Additionally, you can use st.experimental_memo.clear() and st.experimental_singleton.clear() to clear all memo and singleton caches, respectively.

Use case: I have many data gathering functions, and when I clear caches I don’t need ALL of them refreshed.

I am using memo_decorated_function.clear() which seems to work for that specific function’s cache alone. Is that correct?

If not, could we have a unique key for each cache decorator, which would then allow each cache to be cleared individually. The key name could default to the name of the decorated function, or overridden with a key parameter.

Thanks,
Arvindra

1 Like
  • yes, yes.
  • Functionally, the current code solves the most important cases, but with some pains. A function init(str,str) is marked as singleton in my app, but streamlit runs it 6 times before crashing due to OOM. 2 tabs were open and running init(). It looked frozen, so I refresh. Init() finishes and loads the page. Opening a new tab calls init again(). You get the idea. I look forward to the next version. Thank you all.
  • I’ve forgotten all about st.cache, however the name was far better and the organization is now confusing. I recommend asking users about the naming that will follow. st.cache, st.memo, st.singleton, st.session_state. The concern is that users will struggle to remember nuances and the names are not distinctly clear what they do. The challenge that you will run into is that weak programmers will want to use Streamlit. I don’t mean the kids. Those who understand state machines, multi-processing, and multi-tenancy will breeze through the docs, but imagine a data analyst, business intelligence, marketing, and sales dept of a company. They have math skills more than architecture design. You will have to educate them in the way of if-then to provide the basics. This suggestion is also me asking for help from you. Over the last 2 years, I presented many demos and prototypes using streamlit. People always want to run it locally. For every 20 technical people, there will always be someone asking for help. The suggestions is to remove the interface for memo and singleton. Leave two interfaces to the same backend variables, a wrapper and a normal function or dict-like. In both cases, have variables to indicate rules (e.g. the hashable key =tuple(global/local, user UUID, func name, args), enable serialize result, compress serialized result, FIFO length, is global or local key, max RAM, max disk, purge or raise error on OOM, expiration timestamp, …). With the rules explicitly set, validation should become easy. You could then give better error messages for each scenario. Then, no one needs to know how it works. In my proposal, users simply communicate the expectation through variables, and your code would infer the best way to get there. Future versions would maintain the same interface while the backend gets upgraded. Raise an exception if the request doesn’t make sense. TLDR: Suggest changing API to
st.cache(..., serialize_result=True, compression='DEFLATE', queue='FIFO', history_length='inf',
 is_local=False, max_ram='inf', min_free_ram='1g', max_disk='1g', raise_on_oom=False, 
ts_expiration=time.now()+'1d', can_purge=True, verbose=2) and st.session/st.globals as dict-like 
st.globals[key]. 

Delete all the others marked as experimental.

  • What’s missing? resource locking/control (because cached init gets called twice sometimes, when it should lock and wait), security/privacy (option to intercept calls to Pickle load/save with a callable to change encoding), option to use cookies as local key-val storage, storage config (enable replacing the backend KV storage mechanism for transport, backup, scale, or performance), write protection (it keeps re-running the cached function that takes too long to run). As implied already, adding purge mechanisms would be great. Most important is clarity on what I want vs what I’m doing. I understand how your library functions. What I need most is a simple way to control the hashable key under @st.cache to specify global/local and serialize or not.

I really like this interface, but the functionality was the problem. You isolated the problems into modules. I look forward to all of these unified.

st.cache(func=None, persist=False, allow_output_mutation=False, show_spinner=True, 
suppress_st_warning=False, hash_funcs=None, max_entries=None, ttl=None)

Does this mean that st.cache won’t work anymore? Or will it be hidden in the documentation?

2 Likes

The issue with experimental_memo not hashing (hashable) class instance arguments seems to not have been fixed yet. It would be good if that was fixed before cache was deprecated.

Little update…

Thanks for the feedback everyone! :heart: Our main takeaway from here and other talks with users was:

  1. Splitting caching into two separate decorators is the right way to go!
  2. The names memo and singleton are too difficult to understand for a lot of users.

So the solution we’re now leaning towards is:

  • Rename st.experimental_memo to st.cache_data. This command should be used to cache any data objects, e.g. pandas dataframes, numpy arrays, str/int/float, or lists and dicts containing such data objects. Example use cases are dataframe transformations, API queries, ML inference, etc. Behavior will stay the same as for st.experimental_memo, i.e. you always get a fresh copy of the return object at every rerun. This is also the default command you should use in 90% of all cases.
  • Rename st.experimental_singleton to st.cache_resource. This command should be used to cache any global resources that will be shared across all reruns and sessions, e.g. database connections or ML models. For example if you’re initializing a connection or loading an ML model from disk. We’re also working on a more specific st.connection command, which will allow you to connect to databases in a single line of code and should abstract away caching and similar details (see our roadmap blog post). We’re also thinking if we can do something similar for initializing ML models (e.g. an st.model – comment if you have ideas!). In the long run, we see st.cache_resource as an advanced command that most users won’t need to touch.
  • Do a much better job in the docs to explain these two commands, what their differences are, and in which situation you should use what.

We are now implementing the new commands and a few other adjustments. We want to release them in December/January and will start the deprecation of st.cache then. We’re doing the deprecation in a very very careful way! Specifically, we won’t remove st.cache at least until 2.0 to prevent breakage and we’ll give a lot of guidance (both in the app and in the docs) on how to move over to the new commands. In most situations, it should just be a small name change and you’re good to go.

Happy to hear any feedback that y’all have!

4 Likes

I especially love the new names. I think those are much more intuitive.

“”“>>>will let you connect to external databases and APIs with a single line of code. >>>st.database will launch a small database alongside every Streamlit app, so >>>you can permanently store data without any setup.”“”

Please think or give a thought of using duckdb or redis cache for the small db of streamlit… It would be helpful in many and a lot of other scenarios…
A friendly thought/suggestion…
Thanks
Sai

1 Like

I agree, the new names make so much more sense. Excited!

1 Like

I have not used st.cache in forever. I do regularly use st.experimental_singleton and st.experimental_memo. I think they are working fine and greatly improving the user experience in general.

If I could change anything, I’d just add an expiration parameter to st.experimental_singleton.

As far as understanding / documentation goes, I think it is pretty much fine but the scope of it could be stated more explicitly (whether the caching works across sessions and so on).

Regarding the proposed st.connection function it is somewhat hard for me to imagine how would it interact with the vast amount of specialized database drivers, ORMs and so on that’s out there.

Hi @ennui :wave:

The ttl and max_entries expiration parameters have recently been added to st.experimental_singleton with:

They should be available available in the next 1.16.0 release.

3 Likes