How to evolve complex state (e.g., annotate data)?

Streamlit is very convenient to quickly develop apps that have a small fixed state-space (i.e., all the widgets).

How, if at all, is it possible to interactively evolve some more complex state, like a dictionary of annotations? Streamlit is good to interactively showing data instances. But how can I record, say, a binary label per instance that the user indicates per button click or keystroke?

2 Likes

I would like to second this question.

For me, an important part of what I do is review/annotate data, often observation by observation, and review/evaluate models, observation by observation, (and then retrain or show results to stakeholders).

So imagine a deduplication algorithm for database records. I would want to show pairs of records I think are duplicates, paging through them, marking them as dupes or not dupes. I might train a logistic regression on features or improve features. I then want to go through the pairs, see the two records side by side, and the prediction. If they are in the training set, I can see if we have a false positive or negative (and why). If they are not in the training set, I can add them.

Finally, I can use a similar UI to show to stakeholders what the algorithm is going to do.

I find myself spending an inordinate amount of time working on these kinds of data science UIs. Streamlit may do this right now, it’s just not immediately apparent how. I think this is at least part of what Lutz is also asking.

2 Likes

Does this help?

This feature is still under development and we’d love to hear whether it would work for your usecase!

I should mention that a slightly more orthodox (but perhaps less intuitive) approach would be to use a mutable cache object by passing ignore_hash=True to st.cache.

Hi Adrien,
thanks for the responses. I already checked out the SessionState code and implemented a proof-of-concept of an annotation script, but it still feels very hacky and is far from readable.
Can you elaborate on the ignore_hash idea? I don’t see how hashing (or not hashing) the output of a function call changes anything.
Maybe my other question (Memoize/cache partial function) goes into a similar direction? If you allow a function to be executed another time, if the last call with the same arguments yielded None, I could see how to implement an annotation script.

To elaborate, ignore_hash=True lets you create mutable state. For example:

import streamlit as st

@st.cache(ignore_hash=True)
def get_state():
  return []

state = get_state()
state.append(len(state))
st.write(state)
st.button('Rerun')

Every time you run this script it appends a element to the state:

image

I also responded to your partial cache function in the other thread.

Thanks for all the great questions and happy app creating!! :slight_smile:

Hi Adrien,
thanks for the example!
Based on your code, I coded this small prototype:

import streamlit as st

data = ["eins", "zwei", "drei", "vier", "fünf"]
categories = ["good", "bad"]

@st.cache(ignore_hash=True)
def get_annotation():
  return {}

instance = st.empty()
buttons = {}
for cat in categories:
    buttons[cat] = st.empty()

annotation = get_annotation()

if len(annotation)<len(data):
    for cat in categories:
        buttons[cat] = st.button(cat)
    instance.markdown("# "+data[len(annotation)])
    for cat in categories:
        if buttons[cat]:
            index = len(annotation)
            annotation[data[index]] = cat
            if len(annotation) < len(data):
                instance.markdown("# " + data[len(annotation)])

st.write(annotation)

It is relatively readable and does what I want. Just one small issue: Why is the text rendered one too often. I.e., with 5 data instances, I have to click 6 times (where the last button click happens while the last data instance is shown a second time and is inconsequential)?

The extra question is being asked because of a tricky quirk of button semantics: you’re updating the state after the button is clicked (in the if buttons[cat]: block) but before the script is rerun.

To be honest, the conversation is making me rethink the button API a tiny bit.

In an ideal world, this is how I think you code should be written:

import streamlit as st

data = ["eins", "zwei", "drei", "vier", "fünf"]
categories = ["good", "bad"]

@st.cache(ignore_hash=True)
def get_annotation():
  return {}

annotation = get_annotation()
index = len(annotation)
if index < len(data):
    st.markdown("# " + data[index])
    for cat in categories:
        if st.button(cat):
            annotation[data[index]] = cat
            st.rerun()

st.write(annotation)

But unfortunately, st.rerun() does not exist. :confused:

I think it could be hacked together using st.ScriptRunner.RerunException but this requires knowledge of the internal workings of Streamlit which I do no possess. I’m asking the eng team on our internal slack channel if they can help.

Please sit tight and I’ll get back to you.

p.s. You’re helping us understand and improve Streamlit’s design. Thank you for these great questions!

We actually already have a feature request about this, and I’ve updated it with new information from this conversation.

Thank you, @Lutz! :pray:

1 Like

You are welcome! I am blown away by your prompt responses. I am looking forward to see what the limits of this paradigm might be.

@Adrien_Treuille I’m glad to see you’re rethinking the button API; I’ve found that it never behaves as I would expect. The ability to trigger a rerun would be a nice addition. The other awkward part about a button is that it seems to be set to True if the button was previously clicked. This makes it awkward when I want to use the button to trigger some action and update the state in the app (it gets stuck in an infinite loop since the button stays True). What I would expect is something where you:

  • Click button to trigger an update
  • Execute code to modify data objects
  • Re-run the top-down execution with the modified data objects

What I’ve observed in the past is something like

import streamlit as st
import requests

external_api = 'localhost/foo'

color = st.multiselect(
        'What are your favorite colors',
        ('Green', 'Yellow', 'Red', 'Blue'))
submit = st.button('send to server')
if submit:
    requests.post(external_api, json={'color': color})

Will just infinitely send the default color to the server since submit stays True. I can try to create a self contained example later if it would be helpful.

Hey @jeremyjordan! Responses inline:

  • Click button to trigger an update
  • Execute code to modify data objects
  • Re-run the top-down execution with the modified data objects

I agree that your three-part flow for how a button should work is probably right. We’re thinking about how to do that. One API would be something like

@st.button('A button')
def callback():
   do_something()
   do_something_else()

What do you think of that?

Will just infinitely send the default color to the server since submit stays True.

I find this very surprising. The way the buttons work now is that the app is run from top to bottom with the button returning True, the next time the app is run, it should be set back to False.

I can try to create a self contained example later if it would be helpful.

That would be great. If we can reproduce this behavior and it differs from that I just described, then this is definitely a bug we should fix! Thank you!! :pray:

This is all very cool and interesting. I was able to take @Lutz’s example and convert it to load a DataFrame, add annotations to the DataFrame, and finally save it for a current project. There are, of course, many possible embellishments (saving work so far, seeking up to elements not yet annotated, quitting early, etc).

I also ran into the (same) problem where it shows the last item twice. Additionally, the necessity of writing the same code twice to get it to “run” was weird but I just wrote a display() function. All of the global state is making the functional programmer in me twitch.

There are minor things (being able to put the buttons in a row) that I’d like to see, otherwise. I foresee some NLP applications where you might want to return the index of the selection (I’m trying to think about things I have done in the past).

I have no idea if it is “good”, though.

import streamlit as st
import pandas as pd

categories = {"good": 3, "ambiguous": 2, "skip": 1, "bad": 0}

@st.cache(ignore_hash=True)
def get_data():
    data = pd.read_csv("test.csv")
    data["annotation"] = None
    return data

@st.cache(ignore_hash=True)
def get_annotation():
    return {"row": 0}

row = st.empty()
match = st.empty()
buttons = {}

data = get_data()
annotation = get_annotation()

def detail():
    current_obs = data.loc[annotation["row"]]
    row.markdown(f"# {annotation['row'] + 1}")
    match.markdown(f"**{current_obs['location']}** matched **{current_obs['area']}**")

if annotation["row"] < len(data.index):
    for cat in categories.keys():
        buttons[cat] = st.button(cat)
    detail()
    for cat in categories.keys():
        if buttons[cat]:
            data.loc[annotation["row"], "annotation"] = categories[cat]
            annotation["row"] += 1
            if annotation["row"] < len(data.index):
                detail()
else:
    data.to_csv("test_annotated.csv")
    st.write("finished")

I tried to reproduce the odd behavior yesterday and was unable to - I’ve been trying to remember the exact conditions but until I’m able to reproduce it, let’s assume that it was user error :slight_smile:

Decorating a function seems like a natural way to encapsulate the action that a button should take, though I’m a little unclear on how you would place the button on the screen. Would it be something like:

import streamlit as st

@st.button('A button')
def callback():
   do_something()
   do_something_else()

st.title('Example')
st.write('Lorem ipsum dolor sit amet, consectetur adipiscing elit')
callback() 

which would render a button below the text?

Since we’re brainstorming cool APIs, a solution that would avoid that problem is something like:

st.button("Click me!", callback=my_callback)

…but it’s unclear how that would work given Streamlit’s execution model.

So a more “Streamlity” solution would be to limit what can be done in the callback function by transforming it into a pure “state transition function”, like this:

state = SessionState(count=0)

@state.update
def increment_count(state):
  state.count += 1

st.button("Click me!", update=increment_count)
st.write("The count is", state.count)

…where:

  1. SessionsState is one of these objects we’ve proposed in the past. It holds information that persists across reruns of the same script, on a per-user basis.
  2. We’d make SessionsState objects have an .update decorator that is used to mark a function as a “state transition function”. That is, a function whose sole purpose is to take a SessionState object and update it, and it’s not allowed to do things like refer to outer scope objects. This is much less general than just a “callback”, but I think it’s (potentially!) a really nice and clean architecture. It also maps to Streamlit’s execution model really well.
  3. The update argument in st.click only accepts state transition functions.

So when the button is clicked, Streamlit would first call the update function and then rerun the script from top to bottom.

Syntax thoughts

(This is independent of the state question.)

@jeremyjordan: I was thinking that this would instantiate the button on the spot:

@st.button('A button')
def callback():
   do_something()
   do_something_else()

So you’d just call it in the middle of defining your UI and it would create the button right there. For example:

@st.sidebar.button('A button')
def callback():
   do_something()
   do_something_else()

would equivalently add a button to the sidebar.

This approach would allow you to define the callback separately. In fact, the following code would be equivalent by definition:

def callback():
   do_something()
   do_something_else()

st.sidebar.button('A button')(callback)

But I agree with @thiago that it would also be nice to specify this by kwarg:

def callback():
   do_something()
   do_something_else()

st.sidebar.button('A button', callback=callback)

In fact, most python decorators allow this dual decorator / kwarg formulation.

Semantics thoughts

The semantics which I think would make sense would be to run the callback immediately after the click and before the subsequent of the Streamlit script.

The semantics which I think would make sense would be to run the callback immediately after the click and before the subsequent of the Streamlit script.

Agreed. One of the main Streamlit apps that I’ve been working on talks to other APIs and serves mainly as a frontend interface. So for example, I might make a GET request to the backend and populate a list ['a', 'b', 'c'] displayed on the Streamlit app. I might also have options ['d', 'e', 'f'] displayed with a checkbox next to each item. Then below I would have a button to submit the selected items to make a POST request to the backend API. After clicking the button, I would want the action to be triggered and then restart the execution from the top of the script.

I created a little Gist to demonstrate this.

This has some odd behavior, such as state not updating when I would expect and updating when I would not expect it to. This might be a user error but the source of the problem is not clear.

This example is also slightly different than @thiago’s suggestion since state is being managed outside of the Streamlit app (although in the real app I’m managing some state such as the page number using the SessionState object).

This is very helpful @jeremyjordan. FYI: I think the main next step for us is improvements on the caching, then we will get to state / callbacks, hopefully all in 2019. :slight_smile:

I’ve just discovered a data-annotation tool called label studio about a day ago.

It’s promising. I particularly like the idea of keyboard bindings to option selection.

1 Like