Programmable State for Streamlit

Hey Community :wave:,

In a lot of ways getting a Streamlit app to store internal state, like information a user entered in a form, is simply too tricky. We’ve created some workarounds for session state, but we want to give you a baked-in, elegant version of programmable state so you can build apps with intricate sequential logic, such as multi-page apps.

We’d like to use this topic as a central location for your ideas:

  • What use cases would you like supported?
  • How would you expect that use case to work?
  • Any examples of things you’ve already created as workarounds

We will compile all of these ideas and later release a design doc for commenting.

3 Likes

I built an app which connects to Snowflake DB. I built a workaround solution around authentication, which connected to DB to get user credentials. The issue however was that after each modification in a input field the app reloaded and this only happened once the user logged in the app. I believe this issue can be fixed with proper session state management.

2 Likes

Here are some ideas that come in mind where session/state management could come very handy:

  • Complex chained conditions from widget/input instead of triggering at each single change
  • User session management so data filtering based on user permissions (any time you want to build a decent BI, organization wide data insights/dashboard system you need this, or risk having to duplicate apps for each subgroup)
  • Maintain selections across apps/pages (in a multi-page app rollout), where you can offer smoother experience (similar to many BI or Google Analytics for example, when you select date range it is persisted across reports)
3 Likes

One major use case for me actually ties with what @harshjp93 is doing - connecting to snowflake (or other external sources) using oauth2. I want to remove any user credentials from my app, and instead have users authenticate themselves. This however requires storing and retrieving session tokens per user/session and never crossing them. Possibly solvable with query parameters though.

Another one as highlighted in a different thread is around having a “load” button for data. Set parameters, hit load, then do interactive work from that point on. Currently there’s no way of saying “load if not already loaded”. This specific case may be solvable by being able to ask if something is cached or not. Current workaround is to avoid buttons if at all possible - buttons are a bit unusual as they’re “true if they were the last thing to be clicked” with no history of whether they were ever clicked. Dash changed their events around buttons to just having buttons with a state of “number of times clicked”.

One thing I really like about streamlit though is that there aren’t many things like this to deal with, so would personally prefer simplicity and ease of reasoning at the expense of feature support. Streamlit shouldn’t be solving “all apps for all use cases” (imo).

I think many of my multi-page type apps would actually be solved by simply supporting hiding elements / groups of elements. Currently they are either non-existent or visible.

4 Likes

A few things I foresee us doing that are currently hard:

  • Load separate serialized models for different users (it’s likely to be a pytorch model in most cases, but it could be a TF model or something else reasonably common).

  • Load separate raw data for different users where that data is large enough that we don’t want to pull it at each page load. Instead, we’ll calculate some results and cache those user-specific results. The raw data won’t always be in a DB, because it has different schemas for different users. In those cases, it will be raw (typically csv) files in an S3 or GCS bucket. Other types of data will be in Mongo. With the data being large, I imagine it’s impractical to cache the raw data for each user, and we’ll instead calculate a large set of results/transformations up front and cache those.
    Most of the processing we do is converting a pandas DF to either a scalar or another pandas DF. I think we’re likely to do work with shapefiles and geopandas, but I haven’t throught through that much.

  • At some point we tried using matplotlib in streamlit. Graph creation took ~ .2 seconds per graph. I was unable to cache the matplotlib graphs (maybe that’s easier now with the more flexible hashing). It caused us to switch to Altair, which has been fast and generally nice to develop with. So I don’t know if this is still a need for us, but we once wanted to cache matplotlib graphs.

It seems to me like the key for conventional JS apps to be so fast with limited server RAM is that a lot is saved on the client, and the server can send just the required update. I assume that’s impractical for Streamlit to do?

Given the flexibility, we’ll frequently cache too much and run out of memory. We’ll need something that lets us recover gracefully when we shoot ourselves in the foot like that (though I don’t know how I’d want this foot bandage to work)

1 Like

Hi, thanks for the effort. These are few things I’d appreciate, that have not been mentioned.

  • Show / hide a section without loosing given information when hidden
  • Possibility to apply a function of an element multiple times without being overwritten (e.g. a ‘+’ button that would add a new filed after each hit)
  • Also it would be nice if the cached objects would not need to be checked each time anything is activated, but only when the cached field is being changed
1 Like

The most common thing that I want is form data that I don’t want users to have to re-fill out. I’ve made some workarounds by creating a “cache” file and reading/writing from it, but it’s clunky and extremely slow for anything more complex.

1 Like

Thanks for your efforts and the opportunity to share my thoughts and ideas on this topic @Thiago. Regarding your question what jumps into my mind are a few things:

In PHP there is something called Sessions which you already mention, maybe it is possible to Cache whether a user is logged in or use some sort of Cookies instead. To admit, I am not to deep into that topic but maybe something like:

@st.cache[“Name”] = st.text_input(“Name”)
@st.cache[“Password”] = st.text_input(“Passwort” type=“password”)
if @st.cache[“Name”] && @st.cache["Password] != Null => @st.cache.Session_start()

from that point onwards users have access to the analytics page until st.cache.Session_end() is called.

Another option I thought about would be a basic database integration with Streamlit to store the credentials. Yet this is just an idea in progress and I did not spend much time thinking about it.

name= st.text_input(“Name”)
pw = st.text_input(“Passwort” type=“password”)
st.login(name,pw)
=> automatic DB connection and session starts until time expires or
cookies are deleted.

I am looking forward to the way you gonna solve it.

3 Likes

Hej thiago,
I was using one of your “possible designs” to use session states in my application. Unfortunately, the latest update seem to have changed something essential and it is not working anymore with the error: 'Server' object has no attribute '_session_infos'
Could you please point me to what was changed and in case you already know how to solve this?
Thank you,
Matthias

@MatthiasPilz You should use sessionState from this link https://gist.github.com/tvst/0899a5cdc9f0467f7622750896e6bd7f

at line 152-156 you can see that _session_infos has been replaced by _session_info_by_id at version 0.56 and above.

3 Likes