Hi Streamlit Community,
I’ve built an app with the following functionality:
- CSV Upload: Users upload a CSV with vehicle data.
- Field Mapping: Users map two CSV columns to fields: VIN (Vehicle Identification Number) and Address.
- VIN Validation: The app checks the validity of the VINs.
- Vehicle Classification: Users classify unique vehicle types in a new “Category” column and add notes in a “Notes” column. These updates apply back to the fleet dataset.
- Address Categorization: Users categorize unique addresses, with changes reflecting in the dataset.
- Final Dataset Display: The updated dataset is displayed, combining original CSV data with new or modified columns. Users can go back to steps 4 or 5 to make further updates.
- Summary Generation: A summary is created, including a table of vehicles per address, highlighting missing addresses for user action.
- XLSX Download: Users can download the final dataset and summary as an Excel file with two tabs.
Currently, all these features are on a single page, which makes managing states and actions challenging. Initially, I created multiple DataFrames (df1
, df_with_vehicle_changes
, etc.) and used conditional logic (e.g., if "df_with_vehicle_changes"
) to determine the next steps. This approach has proven difficult to maintain, especially as new features are added.
Seeking Best Practices
I’m exploring two potential solutions:
Solution 1: Single DataFrame Approach
- Use a single
df_final
stored inst.session_state
to capture all updates, reducing the number of intermediate DataFrames. - Replace condition checks (
if df
) with state checks (if 'vin_validation_action' in st.session_state
). - Modularize business logic into separate functions to simplify the main app file.
While this approach has cleaned up some code, I still face challenges with state management and debugging.
Solution 2: Multi-Page Approach
- Split functionality across multiple pages, each handling a specific task.
- Retain a single
df_final
DataFrame and usesession_state
to track user actions and unlock new pages.
This might simplify debugging and code maintenance, but I’m unsure if it’s the best path forward. Certainly worse for the user experience of having everything in one page.
Request for Input
What would you recommend as best practices for this use case to maintain clean, easily debuggable, and bug-free code? Any thoughts or suggestions would be greatly appreciated!