Seeking Advice for Streamlit App State Management and Best Practices

Hi Streamlit Community,

I’ve built an app with the following functionality:

  1. CSV Upload: Users upload a CSV with vehicle data.
  2. Field Mapping: Users map two CSV columns to fields: VIN (Vehicle Identification Number) and Address.
  3. VIN Validation: The app checks the validity of the VINs.
  4. Vehicle Classification: Users classify unique vehicle types in a new “Category” column and add notes in a “Notes” column. These updates apply back to the fleet dataset.
  5. Address Categorization: Users categorize unique addresses, with changes reflecting in the dataset.
  6. Final Dataset Display: The updated dataset is displayed, combining original CSV data with new or modified columns. Users can go back to steps 4 or 5 to make further updates.
  7. Summary Generation: A summary is created, including a table of vehicles per address, highlighting missing addresses for user action.
  8. XLSX Download: Users can download the final dataset and summary as an Excel file with two tabs.

Currently, all these features are on a single page, which makes managing states and actions challenging. Initially, I created multiple DataFrames (df1, df_with_vehicle_changes, etc.) and used conditional logic (e.g., if "df_with_vehicle_changes") to determine the next steps. This approach has proven difficult to maintain, especially as new features are added.

Seeking Best Practices

I’m exploring two potential solutions:

Solution 1: Single DataFrame Approach

  • Use a single df_final stored in st.session_state to capture all updates, reducing the number of intermediate DataFrames.
  • Replace condition checks (if df) with state checks (if 'vin_validation_action' in st.session_state).
  • Modularize business logic into separate functions to simplify the main app file.

While this approach has cleaned up some code, I still face challenges with state management and debugging.

Solution 2: Multi-Page Approach

  • Split functionality across multiple pages, each handling a specific task.
  • Retain a single df_final DataFrame and use session_state to track user actions and unlock new pages.

This might simplify debugging and code maintenance, but I’m unsure if it’s the best path forward. Certainly worse for the user experience of having everything in one page.

Request for Input

What would you recommend as best practices for this use case to maintain clean, easily debuggable, and bug-free code? Any thoughts or suggestions would be greatly appreciated!

Hi Lucas,
Interesting! I’m dealing with a similar problem at work. My advice:

  • A single simple data upload page + verification: just make sure the input file is OK. No functionalities are shown until data is uploaded and verified.
  • If all OK, dynamically add different pages with specific functionalities to user. This is easier to maintain, update and grow.

I hate working on large files and scrolling up and down trying to find the precise function. Things work out better if you modularize the components.

1 Like

In addition to what @sebastiandres has mentioned, remember to use st.fragment on individual, independent actions to prevent the entire script from rerunning.

2 Likes

That is great advice @sebastiandres !!

  1. I do have some validation to the input file that I didn’t mention. Not a big deal so far.
  2. makes sense different isolated pages would be easier to maintain.

Question for you: How do you use the session_state then both for tracking multiple states and the “most updated version of the df”? Do you keep them in memmory or go to a SQL DB to read/write the changes made by the user?

FYI I saw your other answer in this post: Anyone creating business-facing apps with Streamlit?

and it seems you tackle very similar problems that I do and solve them with Streamlit.

Would you be up for having a chat and show a bit how we tackle similar issues, eventually learning something new? In case you’re up to I can DM you to figure something out.

truly appreciate your time!

I feel like my problem is a bit of the opposite, when something updates in the widget the user is working on (say st.data_editor), I need this update to be updated over alllllllll the other widgets.

but maybe the st.fragment might be worth for issues like the Using st.data_editor with session state, input data “disappears”: Requires double input to register changes. #7749 ?

I will create a fragment with the st.data_editor in it and run a st.rerun(scope="fragment")
.

Thank you @SiddhantSadangi !

1 Like

So far I haven’t needed to pass dataframes to memory, as the way I’m splitting the functionalities allows for “clean cuts”.
I think that whether keep them on memory or pass them to a file/database depends more on the computing cost. If it takes < 1 minute, keeping them on memory is good enough IMHO.
And sure, send me a DM. Would love to chat and brainstorm.