How to filter data frame via a column and have a check box column to filter rows

Mr.E · July 27, 2023, 8:38pm

Summary

I have a data frame which contains 3 columns. I’d set this up to filter using st.selectbox (x3) so the user can:

identify the row identifier column
identify the postcode column
optionally, select a column to filter on
conditional on 3 having a selection, a multiselect appears to allow the user to select the criteria / features to filter the dataframe on.

This is all working fine, though I wondered if there is a feature I’m missing of the data frame that would avoid having several selectboxes (?). My main issue is I’ve added a check box colum using st.data_editor as I’d also like the user to be able to de-select rows in addition to the above filtering.

This is working, but, is rendering a dataframe version with the checkbox column, in addition to the second dataframe which has the st.selectbox filters. I’d like a way to combine this functionality, so I display just one df, with filters as well as the check box. is this possible? TIA(!).

bruestle · July 27, 2023, 9:47pm

Hi @Mr.E - This isn’t a helpful response to your inquiry but instead a question about how you’re doing this. I’m fairly new to Streamlit and have just started using the data_editor but I’m not sure I saw a feature to provide a filter to the user. Would you mind sharing a snippet of your code? Would this be using the column_config? I think I need to find a few examples to figure this out more.

Mr.E · July 28, 2023, 4:41am

Hi @bruestle - thanks for coming back to me on this thread. I’m new to streamlit too (1 week in ) and first experimented with the data_editor yesterday so you’re in good company.
Happy to share the code snippet - please see below, hope this is useful though any questions fire away and I’ll try to clarify. This code is my work-around to provide a user with a means of filtering their dataset. The approach below requires a certain format of dataset currently (mainly to simplify the problem while getting started with this project), but the intention would be to adjust this so that it scales to more or less any df (with the requirement that this contains at least a column for unique row ID, and the another containing UK postcode).

#dummy data
current_site_names = ['site_1', 'site_2', 'site_3', 'site_4', 'site_5'] 
current_site_postcodes = ['S40 3HW', 'S45 9LX', 'DE56 1WH', 'S40 1QN', 'DE55 7BD'] 
current_site_some_classifier = [1, 2, 3, 1, 3]

#dict as intermediary step to creating dataframe to display to the user (hardcoded approach to create a df to build around, would need replacing with an alternative method to ingest actual site information) 
data_dict = {'Site': current_site_names, 'Post_code': current_site_postcodes, 'Some_Classifier': current_site_some_classifier}
df_sites = pd.DataFrame(data_dict)

#add bool column to exclude specific sites from the model 
#purpose: to enable scenario modelling of a change to the configuration of sites (e.g. removing *and* adding, not just removing, or adding, or as-is)
list_bool_include = [True for x in range(len(current_site_names))]
df_sites['Include?'] = list_bool_include

#With the below I wanted to overwrite the original variable df_sites with the dynamic df but this doesnt seem to replace it as expected, and instead, renders a version with the data_editor (check box) feature in - this is what prompted my question in this thread
df_sites = st.data_editor(df_sites, use_container_width=True, num_rows="dynamic") #dynamic = allows new rows to be added in-browser

#list to subsequently filter the df on - when replaced with the uploaded df, this could be redirected to the df.columns, which will dynamically show the user the column labels in the selectboxes that are present in whatever df they upload, meaning filtering is possible for any df being used.
col1, col2, col3, col4 = st.columns(4)
with col1:
    site_name_col = st.selectbox(label='(Core) select site name/identifier field:', options=list(df_sites.columns))
with col2:
    postcode_col = st.selectbox(label=f'(Core) select site postcode field:', options=list(set(df_sites.columns)))
with col3:
    filter_column = selectbox(label='(Optional) select filter column', options=list(df_sites.columns), )
with col4:
    #try block which handles KeyError that otherwise occurs until the user selects a col to filter on 
    try:
        data_choices = st.multiselect(label=f'(Optional) select filter criteria for: {filter_column}', options=list(set(df_sites[filter_column])))
    except KeyError:
        pass

#try block which handles NameError that otherwise occurs at first load of the app 
try:    
    st.session_state['data_choices'] = data_choices
    if len(st.session_state['data_choices']) > 0:
        df_sites = df_sites[df_sites[filter_column].isin(data_choices)]
        
    else:
        pass
except NameError:
    pass

hope this is helpful :slight_smile

Mr.E · July 28, 2023, 1:44pm

I’ve reordered the code, and found I can use the approach with select boxes etc. on the data_editor which has solved this problem now

system · July 30, 2023, 1:45pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filter dataframe by selections made in select box Using Streamlit pandas	15	53708	December 26, 2023
Filter dataframe by selected features Using Streamlit	3	3377	January 12, 2022
Multi Select is Not Filtering DataFrame Using Streamlit pandas	3	5129	September 8, 2021
Multiple select boxes don't filter the dataframe Using Streamlit	3	640	January 22, 2024
Filter column_config's selectbox based on multiple other selectbox values Using Streamlit	2	1293	April 21, 2024

How to filter data frame via a column *and* have a check box column to filter rows

Summary

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies

How to filter data frame via a column and have a check box column to filter rows