So I trained some ML models and deployed them on Streamlit. The user can input new data (text) and the classification models return their predictions - Yes/No. Since I have multiple models I simply iterate over them in a
for loop and then print the decisions one after each other.
Now I was thinking of using this app to make the annotation-life of my colleagues easier and put noisy labels on new documents. So the user uploads a document/inputs text, the model return the predictions and then the user can indicate whether the prediction is correct or not.
The app then stores the document/text as a datapoint and all user feedbacks as labels. This way someone could generate new gold standard training data and I can re-train my models, etc…You get it
Any hints how to build this in Streamlit? Any hints (also to other tools) are welcomed!
I suppose you can store it in the following structure:
id, input_data, model_id, prediction, feedback
1, hello world, 1, True, False
2, hello, 2, False, True
- If you want a quick first prototype, you could create a SQLite database, locally per user and then insert predictions in there through SQL queries. Those are then stored inside a local file like
database.db which is easy to manipulate, instead of relying on overwriting CSV files multiple times. You could even add an app to browse the feedbacks and edit them from Streamlit, issuing the corresponding SQL queries to edit the local database. Later on you could ask them to send you back their SQLite file so you can merge their feedbacks
- If you feel more adventurous then you could host a database somewhere (MySQL, MongoDB, Firestore, any you like to use) and have the Streamlit app store the data there for everyone, so no merging database hassle. This solution is a bit more demanding though …
What do you think?
I’d be curious to know how you solved this as I’m trying to manage persistent data that can be updated by the app. I followed the Streamlit Firestore blog post, but they never wrote parts 3 and 4
Hi @andfanilo that sounds great! Thanks for the reply and apologies for my late response. I think I feel more adventurous and this may be a good opportunity to increase my full-stack dev skills.
@Peej1226 I haven’t worked on this again unfortunately, but for me it seemed that I could also opt for some kind of labeling tool solution (Prodigy or https://labelstud.io/). Maybe that is helpful for your use case as well.
My question was more directed at your solution for storing data.
My current workflow is this:
- Locally run a beautiful soup based web scraping
- Update a locally stored CSV
- Push CSV to GitHub
- Streamlit pulls data from GitHub and provides visualizations
My future state is
- Daily script runs (still not yet found a solution for this, interim solution is when ever streamlit app is accessed to run this)
a. Performs webscraping
b. Update db (I’m targeting FireStore but haven’t figured that out)
- Streamlit pulls data from db and provides visualizations
So I have two solutions that I need
- how to schedule scripts to run
- how to persist data in a no local data store
hope you’re doing fine! So this week I want to prototype it finally.
I am noticing that while the storing of data is non-trivial it is definitely solvable. However, the feedback loop seems more difficult, especially when I have a large numbers of items that require interaction.
Please find attached an example screenshot of a ML application. You see it takes text as inputs and outputs the prediction results in a simple table. Now, imagine a non-technical user would use this application and are asked to “correct” the predictions. I think the best solution would be to have an additional column that would contain an interactive widget per row that takes in a “correct/incorrect” decision by the user.
Then there is one button below the table “Submit” and the corrected predictions are stored together with the input text.
So do you think this is possible with streamlit or am I just dreaming?
I haven’t thought too much about it yet, but my instinct says you can use streamlit-aggrid from @PablocFonseca to build an interactve datagraid with a column “feedback”, with a checkbox per row for correct/incorrect, then a
Submit button to get back the table from Ag-Grid and work on the feedback column which should contain boolean values.
Hope this works!