This is all very cool and interesting. I was able to take @Lutz’s example and convert it to load a DataFrame, add annotations to the DataFrame, and finally save it for a current project. There are, of course, many possible embellishments (saving work so far, seeking up to elements not yet annotated, quitting early, etc).
I also ran into the (same) problem where it shows the last item twice. Additionally, the necessity of writing the same code twice to get it to “run” was weird but I just wrote a display() function. All of the global state is making the functional programmer in me twitch.
There are minor things (being able to put the buttons in a row) that I’d like to see, otherwise. I foresee some NLP applications where you might want to return the index of the selection (I’m trying to think about things I have done in the past).
I have no idea if it is “good”, though.
import streamlit as st
import pandas as pd
categories = {"good": 3, "ambiguous": 2, "skip": 1, "bad": 0}
@st.cache(ignore_hash=True)
def get_data():
data = pd.read_csv("test.csv")
data["annotation"] = None
return data
@st.cache(ignore_hash=True)
def get_annotation():
return {"row": 0}
row = st.empty()
match = st.empty()
buttons = {}
data = get_data()
annotation = get_annotation()
def detail():
current_obs = data.loc[annotation["row"]]
row.markdown(f"# {annotation['row'] + 1}")
match.markdown(f"**{current_obs['location']}** matched **{current_obs['area']}**")
if annotation["row"] < len(data.index):
for cat in categories.keys():
buttons[cat] = st.button(cat)
detail()
for cat in categories.keys():
if buttons[cat]:
data.loc[annotation["row"], "annotation"] = categories[cat]
annotation["row"] += 1
if annotation["row"] < len(data.index):
detail()
else:
data.to_csv("test_annotated.csv")
st.write("finished")