How to filter data frame via a column *and* have a check box column to filter rows

Summary

I have a data frame which contains 3 columns. I’d set this up to filter using st.selectbox (x3) so the user can:

  1. identify the row identifier column
  2. identify the postcode column
  3. optionally, select a column to filter on
  4. conditional on 3 having a selection, a multiselect appears to allow the user to select the criteria / features to filter the dataframe on.

This is all working fine, though I wondered if there is a feature I’m missing of the data frame that would avoid having several selectboxes (?). My main issue is I’ve added a check box colum using st.data_editor as I’d also like the user to be able to de-select rows in addition to the above filtering.

This is working, but, is rendering a dataframe version with the checkbox column, in addition to the second dataframe which has the st.selectbox filters. I’d like a way to combine this functionality, so I display just one df, with filters as well as the check box. is this possible? TIA(!).

Hi @Mr.E - This isn’t a helpful response to your inquiry but instead a question about how you’re doing this. I’m fairly new to Streamlit and have just started using the data_editor but I’m not sure I saw a feature to provide a filter to the user. Would you mind sharing a snippet of your code? Would this be using the column_config? I think I need to find a few examples to figure this out more.

Hi @bruestle - thanks for coming back to me on this thread. I’m new to streamlit too (1 week in :smiley: ) and first experimented with the data_editor yesterday so you’re in good company.
Happy to share the code snippet - please see below, hope this is useful though any questions fire away and I’ll try to clarify. This code is my work-around to provide a user with a means of filtering their dataset. The approach below requires a certain format of dataset currently (mainly to simplify the problem while getting started with this project), but the intention would be to adjust this so that it scales to more or less any df (with the requirement that this contains at least a column for unique row ID, and the another containing UK postcode).

#dummy data
current_site_names = ['site_1', 'site_2', 'site_3', 'site_4', 'site_5'] 
current_site_postcodes = ['S40 3HW', 'S45 9LX', 'DE56 1WH', 'S40 1QN', 'DE55 7BD'] 
current_site_some_classifier = [1, 2, 3, 1, 3]

#dict as intermediary step to creating dataframe to display to the user (hardcoded approach to create a df to build around, would need replacing with an alternative method to ingest actual site information) 
data_dict = {'Site': current_site_names, 'Post_code': current_site_postcodes, 'Some_Classifier': current_site_some_classifier}
df_sites = pd.DataFrame(data_dict)

#add bool column to exclude specific sites from the model 
#purpose: to enable scenario modelling of a change to the configuration of sites (e.g. removing *and* adding, not just removing, or adding, or as-is)
list_bool_include = [True for x in range(len(current_site_names))]
df_sites['Include?'] = list_bool_include

#With the below I wanted to overwrite the original variable df_sites with the dynamic df but this doesnt seem to replace it as expected, and instead, renders a version with the data_editor (check box) feature in - this is what prompted my question in this thread
df_sites = st.data_editor(df_sites, use_container_width=True, num_rows="dynamic") #dynamic = allows new rows to be added in-browser

#list to subsequently filter the df on - when replaced with the uploaded df, this could be redirected to the df.columns, which will dynamically show the user the column labels in the selectboxes that are present in whatever df they upload, meaning filtering is possible for any df being used.
col1, col2, col3, col4 = st.columns(4)
with col1:
    site_name_col = st.selectbox(label='(Core) select site name/identifier field:', options=list(df_sites.columns))
with col2:
    postcode_col = st.selectbox(label=f'(Core) select site postcode field:', options=list(set(df_sites.columns)))
with col3:
    filter_column = selectbox(label='(Optional) select filter column', options=list(df_sites.columns), )
with col4:
    #try block which handles KeyError that otherwise occurs until the user selects a col to filter on 
    try:
        data_choices = st.multiselect(label=f'(Optional) select filter criteria for: {filter_column}', options=list(set(df_sites[filter_column])))
    except KeyError:
        pass

#try block which handles NameError that otherwise occurs at first load of the app 
try:    
    st.session_state['data_choices'] = data_choices
    if len(st.session_state['data_choices']) > 0:
        df_sites = df_sites[df_sites[filter_column].isin(data_choices)]
        
    else:
        pass
except NameError:
    pass

hope this is helpful :slight_smile

I’ve reordered the code, and found I can use the :point_up: approach with select boxes etc. on the data_editor which has solved this problem now :slight_smile:

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.