Programmable State for Streamlit

UPDATE: Streamlit now has official support for Session State! More info here

Hey Community :wave:,

In a lot of ways getting a Streamlit app to store internal state, like information a user entered in a form, is simply too tricky. We’ve created some workarounds for session state, but we want to give you a baked-in, elegant version of programmable state so you can build apps with intricate sequential logic, such as multi-page apps.

We’d like to use this topic as a central location for your ideas:

  • What use cases would you like supported?
  • How would you expect that use case to work?
  • Any examples of things you’ve already created as workarounds

We will compile all of these ideas and later release a design doc for commenting.

11 Likes

I built an app which connects to Snowflake DB. I built a workaround solution around authentication, which connected to DB to get user credentials. The issue however was that after each modification in a input field the app reloaded and this only happened once the user logged in the app. I believe this issue can be fixed with proper session state management.

5 Likes

Here are some ideas that come in mind where session/state management could come very handy:

  • Complex chained conditions from widget/input instead of triggering at each single change
  • User session management so data filtering based on user permissions (any time you want to build a decent BI, organization wide data insights/dashboard system you need this, or risk having to duplicate apps for each subgroup)
  • Maintain selections across apps/pages (in a multi-page app rollout), where you can offer smoother experience (similar to many BI or Google Analytics for example, when you select date range it is persisted across reports)
3 Likes

One major use case for me actually ties with what @harshjp93 is doing - connecting to snowflake (or other external sources) using oauth2. I want to remove any user credentials from my app, and instead have users authenticate themselves. This however requires storing and retrieving session tokens per user/session and never crossing them. Possibly solvable with query parameters though.

Another one as highlighted in a different thread is around having a “load” button for data. Set parameters, hit load, then do interactive work from that point on. Currently there’s no way of saying “load if not already loaded”. This specific case may be solvable by being able to ask if something is cached or not. Current workaround is to avoid buttons if at all possible - buttons are a bit unusual as they’re “true if they were the last thing to be clicked” with no history of whether they were ever clicked. Dash changed their events around buttons to just having buttons with a state of “number of times clicked”.

One thing I really like about streamlit though is that there aren’t many things like this to deal with, so would personally prefer simplicity and ease of reasoning at the expense of feature support. Streamlit shouldn’t be solving “all apps for all use cases” (imo).

I think many of my multi-page type apps would actually be solved by simply supporting hiding elements / groups of elements. Currently they are either non-existent or visible.

6 Likes

A few things I foresee us doing that are currently hard:

  • Load separate serialized models for different users (it’s likely to be a pytorch model in most cases, but it could be a TF model or something else reasonably common).

  • Load separate raw data for different users where that data is large enough that we don’t want to pull it at each page load. Instead, we’ll calculate some results and cache those user-specific results. The raw data won’t always be in a DB, because it has different schemas for different users. In those cases, it will be raw (typically csv) files in an S3 or GCS bucket. Other types of data will be in Mongo. With the data being large, I imagine it’s impractical to cache the raw data for each user, and we’ll instead calculate a large set of results/transformations up front and cache those.
    Most of the processing we do is converting a pandas DF to either a scalar or another pandas DF. I think we’re likely to do work with shapefiles and geopandas, but I haven’t throught through that much.

  • At some point we tried using matplotlib in streamlit. Graph creation took ~ .2 seconds per graph. I was unable to cache the matplotlib graphs (maybe that’s easier now with the more flexible hashing). It caused us to switch to Altair, which has been fast and generally nice to develop with. So I don’t know if this is still a need for us, but we once wanted to cache matplotlib graphs.

It seems to me like the key for conventional JS apps to be so fast with limited server RAM is that a lot is saved on the client, and the server can send just the required update. I assume that’s impractical for Streamlit to do?

Given the flexibility, we’ll frequently cache too much and run out of memory. We’ll need something that lets us recover gracefully when we shoot ourselves in the foot like that (though I don’t know how I’d want this foot bandage to work)

2 Likes

Hi, thanks for the effort. These are few things I’d appreciate, that have not been mentioned.

  • Show / hide a section without loosing given information when hidden
  • Possibility to apply a function of an element multiple times without being overwritten (e.g. a ‘+’ button that would add a new filed after each hit)
  • Also it would be nice if the cached objects would not need to be checked each time anything is activated, but only when the cached field is being changed
2 Likes

The most common thing that I want is form data that I don’t want users to have to re-fill out. I’ve made some workarounds by creating a “cache” file and reading/writing from it, but it’s clunky and extremely slow for anything more complex.

1 Like

Thanks for your efforts and the opportunity to share my thoughts and ideas on this topic @Thiago. Regarding your question what jumps into my mind are a few things:

In PHP there is something called Sessions which you already mention, maybe it is possible to Cache whether a user is logged in or use some sort of Cookies instead. To admit, I am not to deep into that topic but maybe something like:

@st.cache[“Name”] = st.text_input(“Name”)
@st.cache[“Password”] = st.text_input(“Passwort” type=“password”)
if @st.cache[“Name”] && @st.cache["Password] != Null => @st.cache.Session_start()

from that point onwards users have access to the analytics page until st.cache.Session_end() is called.

Another option I thought about would be a basic database integration with Streamlit to store the credentials. Yet this is just an idea in progress and I did not spend much time thinking about it.

name= st.text_input(“Name”)
pw = st.text_input(“Passwort” type=“password”)
st.login(name,pw)
=> automatic DB connection and session starts until time expires or
cookies are deleted.

I am looking forward to the way you gonna solve it.

5 Likes

Hej thiago,
I was using one of your “possible designs” to use session states in my application. Unfortunately, the latest update seem to have changed something essential and it is not working anymore with the error: 'Server' object has no attribute '_session_infos'
Could you please point me to what was changed and in case you already know how to solve this?
Thank you,
Matthias

1 Like

@MatthiasPilz You should use sessionState from this link https://gist.github.com/tvst/0899a5cdc9f0467f7622750896e6bd7f

at line 152-156 you can see that _session_infos has been replaced by _session_info_by_id at version 0.56 and above.

3 Likes

I have corporate data and I display various metrics in a monolithic app using a pretty good set of .net tools. I really want to get out of the business of coding up every dashboard item my users come up with.

I too need some form of session management and probably need to be able to display reasonably on desktop and phone.

The biggest ask is that I want possibly dozens of SPAs arranged in a hierarchy and alternatively I want to be able to cycle through the visualizations based on some timing and or alarm conditions


I want to configure it as simply as possible, something like dashboard markdown that might build a clickable list of SPA or a list of SPAs that cycle automatically with simple logic.

I would want it to be easy for a vis to decide it didn’t need to be shown (rare alarm).

For anyone who is interested, I developed another solution to implementing programmable state by storing state variables for user inputs and even dataframes into a Postgres database. The added benefit with this method is that you can also store objects and binary files and can keep a track of user inputs with timestamps if that is relevant to your application.

Please refer to this tutorial to find more details:

4 Likes

It seems to me that one of the top uses of a state manager is to enable multi-page apps as per this discussion: Multi-page app with session state

However, I believe the solution for multi-page apps needs to be higher level than just a state manager, and potentially different under the hood. The ideal solution wouldn’t re-compute all pages, ideally everything would be cached/stateful, i.e. not just widgets (plots, dataframe computations, etc). It’d be the equivalent of doing a simple CSS hide/show. Not sure if you guys are already exploring this, but thought I’d chime in as I think the state manager is too low-level and doesn’t cover all the needs in the multi-page use case.

Thank you for the superb work!!

1 Like

My use case is that I’m using Streamlit to build a GUI for the open source network observability tool that I’m building called Suzieq.

I need support for multi-page apps so that users can switch back and forth between pages and not lose their place. Today, there are two different solutions it seems to achieve this: the SessionState code provided by @thiago and the improved one by @FranzDiebold. The second one I know of which actually works well for multi-page apps is the SessionState code by @okld. This latter code works well for my needs albeit being slower than the others because it reruns everything once more after streamlit does its usual rerun. Caching helps, but it shouldn’t have to be this way. The former session state code commonly fails to maintain state when switching between pages. I echo what @arturadib wrote in this thread.

Hope this helps,

Dinesh

I was mocking up a demo programmable state API and came up with the following. Figured here would be a good place to post it to get/give ideas.

Cheers,
Simon


import streamlit as st

absolute_zero_shift = 273.25
fahrenheit_gradient = 9.0 / 5.0


def calc_fahrenheit(celsius):
    return fahrenheit_gradient * celsius + 32


def calc_celsius_from_fahrenheit(fahrenheit):
    return (fahrenheit - 32) / fahrenheit_gradient


def calc_celsius_from_kelvin(kelvin):
    return kelvin - absolute_zero_shift


def calc_kelvin(celsius):
    return celsius + absolute_zero_shift


min_value = 0
max_value = 100
step = 1

state = st.session_state.get(
    celsius=min_value,
    fahrenheit=calc_fahrenheit(min_value),
    kelvin=calc_kelvin(min_value),
)

st.slider(
    "Celsius",
    # Build the widget objects to be "state aware", as in, if a state
    # object is passed as a value, have the widget respond automatically
    # to state updates
    value=state.celsius,
    min_value=min_value,
    max_value=max_value,
    step=step,
)

st.slider(
    "Fahrenheit",
    value=state.fahrenheit,
    min_value=calc_fahrenheit(min_value),
    max_value=calc_fahrenheit(max_value),
    step=fahrenheit_gradient * step,
)

# Have the "link_to" function create a directional 'master' graph. Make
# it so that on each UI trigger the links/edges on that master graph are
# traversed, all the while building a record of the traversed links in
# a directional acyclic graph. When traversing these links should any
# traversal along an edge on the 'master' graph causes a cycle, have
# that edge not be traversed for that particular UI interaction.
state.celsius.link_to(state.fahrenheit, calc_fahrenheit)
state.fahrenheit.link_to(state.celsius, calc_celsius_from_fahrenheit)

# Example flow:

# * User triggers the fahrenheit slider
# * state.fahrenheit updated
# * state.celsius to updated by state.fahrenheit -> state.celsius link
#   * importantly, the state.celsius -> state.fahrenheit is not triggered
#     as this is the first step in the chain that produces a cycle in the
#     link graph. These links can be propagated and carried out until no
#     more links remain, or following a link would produce a cycle for
#     that iteration.
# * this triggers celsius slider to update

st.slider(
    "Kelvin",
    value=state.kelvin,
    min_value=calc_kelvin(min_value),
    max_value=calc_kelvin(max_value),
    step=step,
)

state.kelvin.link_to(state.celsius, calc_celsius_from_kelvin)

# Importantly, the directional 'master' graph is defined each run. The
# master graph can change between reruns:
if st.checkbox("Auto update Kelvin?"):
    # Why someone would want to unlink the kelvin slider... I don't
    # know. But... for the example :).
    state.celsius.link_to(state.kelvin, calc_kelvin)


# The state objects can be updated within the Streamlit script by
# calling the "update" method.
if st.button("Make it BOIL!"):
    state.celsius.update(100)


# To access the value within a state object the "value" parameter needs
# to be called.
st.write(
    f"""
        Overview:

        * Celsius: `{state.celsius.value}`
        * Fahrenheit: `{state.fahrenheit.value}`
        * Kelvin: `{state.kelvin.value}`

    """
)

Give us “callbacks” like how they are implemented in dash.

Hello @kart2k15, welcome to the community.

We are open to requests on programmable state, but the feedback would be much better described as Thiago has stated:

Can you rather tell us what your current problem is and how “callbacks” would solve it?

Have a nice day,
Fanilo

1 Like

My “usecase”—
User uploads file using file_uploader, selects a seperator/delim from dropdown menu.

The app displays the the first 5 rows, a pandas profile report and a dropdown asking user to select a categorical/discrete column for dimensionality reduction (in my case ad campaign names)

The user selects a column, and I clean it using regex, create a similarity matrix using levenshtein/cosine and then display plotly dendrogram, a input_number element, and another dropdown to select the clustering method (ward, average etc).

The user looks at the dendrogram, decides on the no of cluster & method he/she/they want. I take those inputs run agglomerative clustering on that similarity matrix, and create cluster_df using the cluster labels, and then display the cluster_df

All this needs to be done on a single page! And so far it’s been a pain to create it in streamlit!

@kart2k15, maybe the st.cache function can help you to not recalculate everything upon changes.
But feel free to use plotly-dash / Jupyter-voila / bokeh-panel if Streamlit is such a pain.

Hi @kart2k15, thanks a lot for your feedback :slight_smile: very useful to know about your workflow, I can relate to how frustrating it can be to manage this kind of app in Streamlit with lots of cache and unofficial Session state all over the place.

Just to let you know as an “insider”, we are testing callbacks, and trying to judge how well it fits into Streamlit’s model. We have to make sure we are not breaking any native widget nor any other Streamlit feature like layout or components, so I’m not guaranteeing you’ll have callbacks nor that you’ll get them soon, maybe there will be another more magical solution
but do know that your concerns are heard and I bet there will be announcements soonish, so bear with us with patience :wink:

If you are able to share part of the code for your project, feel free to do so! We would love to look into how we can simplify it with cache or the future version of programmable state.

Thanks for your time,
Fanilo

3 Likes