Filter dataframe by selections made in select box

srog · October 26, 2020, 12:42am

I have a large dataframe (about 48,000 rows) that I want to filter through via user inputs to select boxes. I have my sidebar configured as shown below.

Is it possible in Streamlit to use those user inputs as filters through the dataframe and display the contents of a final dataframe column? I have 5 columns of data in my dataframe (4 of the columns are the inputs shown above). I want to show the last column in my dataframe on the main part of the page after filtering from the chosen inputs.

I always want to show all of the possible entries from the “makes” columns because that’s the initial start of the filtering. But after a selection is made in that column, I would like for the available selections to only be the filtered down “years” of that “make” and so on through each select box.

Also, is there a way to best optimize the loading of this data? Would it be best to load all of the data in the beginning and keep it in the cache or is would it be better to step-by-step load the data as the user makes selections?

Here’s my current, very simple code:

import streamlit as st
import pandas as pd

@st.cache
def get_data():
path = r'cars.csv'
return pd.read_csv(path)
df = get_data()

makes = df['make'].drop_duplicates()
years = df['year']
models = df['model']
engines = df['engine']
components = df['components']
make_choice = st.sidebar.selectbox('Select your vehicle:', makes)
year_choice = st.sidebar.selectbox('', years)
model_choice = st.sidebar.selectbox('', models)
engine_choice = st.sidebar.selectbox('', engines)

st.write('Results:', components)

Marisa_Smith · October 26, 2020, 4:43pm

Hi @srog,

Welcome to the Streamlit Community! You can definitely filter user inputs based on the selections in the select boxes you are using. This is not actually Streamlit functionality but Pandas functionality. For example, you could use:

df.loc[(df['make']=make_choice) & (df['year']=year_choice) & (df['model']= model_choice) & (df[engine]=engine_choice)]

This will display the pandas data frame with all of the user conditions met. However, it will not filter your further choices by the years of that make etc… You can do this by changing the order in how you’re creating your options for your dropdown selections.

import streamlit as st
import pandas as pd

@st.cache
def get_data():
path = r'cars.csv'
return pd.read_csv(path)
df = get_data()

makes = df['make'].drop_duplicates()
make_choice = st.sidebar.selectbox('Select your vehicle:', makes)
years = df["year"].loc[df["make"] = make_choice]
year_choice = st.sidebar.selectbox('', years) 
etc ....

You can continue this pattern with the other two or filter them based on what the make of the car, like in this example.

In terms of the optimizing the loading of this data, I think it is fine the way you have it. This way a person who changes their choice in a “make” of a car will not have to wait for a whole new data set to load. They just wait for pandas to filter their results (this is very fast).

Hope this helps!
Happy Streamlit-ing!
Marisa

srog · November 6, 2020, 2:30am

Thanks so much! With a little modification, this worked great. I thought there might be some built in Streamlit functionality that might help me filter the data, but using the built in pandas functionality worked great.

wandabwa2004 · January 19, 2021, 3:02am

Not having any luck on my side. The filter just worked for the first row. I cannot seem to get other corresponding rows as expected. Any chance of sharing your modification?

wandabwa2004 · January 20, 2021, 12:08am

Hello @Marisa_Smith . I have the same issue but my code doesn’t seem to work well. I replicated my code in a similar manner to the below code.

makes = df['make'].drop_duplicates()
make_choice = st.sidebar.selectbox('Select your vehicle:', makes)
years = df["year"].loc[df["make"] == make_choice]
year_choice = st.sidebar.selectbox('', years) 
models = df["model"].loc[df["make"] == make_choice]
model_choice = st.sidebar.selectbox('', models)
engines = df['engine'].loc[df["make"] == make_choice]
engine_choice = st.sidebar.selectbox('', engines)

The problem is that I can only filter details in the first row of the dataframe as opposed to filtering the entire dataframe. The moment I change the make to any other value apart from the Make ID in the first row , I get an error as below.

KeyError: 0

Traceback:

File "c:\users\xxxx\anaconda3\envs\py37\lib\site-packages\streamlit\script_runner.py", line 332, in _run_script
    exec(code, module.__dict__)File "C:\Users\xxxx\xxxx\xxxx\New\app4.py", line 79, in <module>
    year_choice = st.sidebar.selectbox('Select the Year:', years)File "c:\users\xxxx\anaconda3\envs\py37\lib\site-packages\streamlit\elements\selectbox.py", line 82, in selectbox
    if len(options) > 0 and options[current_value] is not NoneFile "c:\users\xxxx\anaconda3\envs\py37\lib\site-packages\pandas\core\series.py", line 824, in __getitem__
    return self._get_value(key)File "c:\users\xxxx\anaconda3\envs\py37\lib\site-packages\pandas\core\series.py", line 932, in _get_value
    loc = self.index.get_loc(label)File "c:\users\xxxx\anaconda3\envs\py37\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err

Any idea on what the problem would be? Not sure whether is something Pandas related or the integration of the two?

frances · February 2, 2021, 4:53pm

Yeah, I have the same issue. wonder if this being resolved?

frances · February 2, 2021, 6:15pm

It works if use an array instead.

makes = df[‘make’].unique()

wandabwa2004 · February 2, 2021, 7:36pm

Yes, it works. Thank you so much.

Rushmore · March 13, 2021, 8:39pm

How did you display the final data? As in after the filtering code, how did you structure the code to account for the filtering to display the final filtered data?

Rushmore · March 14, 2021, 2:01pm

DW, sorted this

Anupam_Singh · May 20, 2021, 11:39am

have you got the display of final data?

Anupam_Singh · May 20, 2021, 11:40am

i need to know how?

Anupam_Singh · May 20, 2021, 1:25pm

thanks for the solution, however this line - ```
years = df[“year”].loc[df[“make”] = make_choice]

Rushmore · September 2, 2021, 11:32pm

Really sorry for the late response, been occupied with other things.

Pre-requisite:

I have shaped my data like this to be able to filter through it.
I will then reshape the dataframe into a new dataframe using the pd.pivot_table function to create columns from the row data from a column, called ‘Data’ in my data and subsequent values to populate them based on the filtered choices chosen in the select boxes.

So this is how my code works:

    countries = data.Geography.unique() # filter a column called Geography
    # first data filtering choice by country
    Country_choice = Data_filtering[0].selectbox("Geography", countries) # populate the result in a selectbox to choose data from
    # CATEGORY - get all row values in the category column that are in the country column
    category = data['Category'].loc[data['Geography'] == Country_choice].unique()  
    Category_choice = Data_filtering[1].selectbox("Category", category)
    # SERIES - get all series row values that are in the category column
    series = data.Series.loc[data['Category']==Category_choice].unique()
    Series_choice = Data_filtering[2].radio('Sequential Data type', series)


    # filter another column called 'Data' which I will use as columns for the reshaped dataframe to display the dataframe and then chart the selected data. Here the 'Data' column will be displayed based on the choices made in the above. 
    data_col = data['Data'][(data['Geography']==Country_choice) & (data['Category']==Category_choice) & (data['Series']==Series_choice) & (data['Data Type']==Data_type_choice)].unique() 

    # Create a pivot table to reshape the dataframe into a 2d dataframe which I can use the data_col variable choices to select the filtered data from. 
    Trans_data=data.pivot_table(index='Date', columns='Data', values='Values').rename_axis(None, axis=1)

You can also reshape your data if its displayed as my previous post here.

asehmi providers a wonderful solution for data structured as was so in the picture I displayed in that post. But the code should work in the same way.

arnaud · December 1, 2022, 12:25pm

Hey! We came up with a nearly generic solution to this. Discover DataFrame Explorer, and read more about it in our blog

HarryMelamed · December 26, 2023, 8:48pm

I tried this and there’s no effect

Is this line correct?

makes = df[‘make’].drop_duplicates()
make_choice = st.sidebar.selectbox(‘Select your vehicle:’, makes)
years = df[“year”].loc[df[“make”] = make_choice]
year_choice = st.sidebar.selectbox(‘’, years)

df = pd.read_sql(query, mydb)
nameList = df[‘LastName’].drop_duplicates()
yearList = df[‘Distribution_Year’].drop_duplicates()
quarterList = df[‘Distribution_quarter’].drop_duplicates()
investmentList = df[‘Investment’].drop_duplicates()

st.dataframe(df)

FilterInvestment = st.sidebar.selectbox(“Investment”, investmentList)
FilterLastName = st.sidebar.selectbox(“LastName”, nameList)
FilterYear = st.sidebar.multiselect(“Distribution_Year”,yearList)
FilterQuarter = st.sidebar.multiselect(“Distribution_Quarter”, quarterList)

InvestmentFilter = df[“Investment”].loc[df[“Investment”] == FilterInvestment]

Topic		Replies	Views
How to show filtered dataframe after user select value from selectbox in python and streamlit? Using Streamlit	1	1108	August 15, 2022
Streamlit Dataframe Filtering Using Streamlit	5	3018	February 2, 2023
Filter dataframe by selected features Using Streamlit	3	3368	January 12, 2022
Filter selectbox values of table based on another column selectbox values Using Streamlit discussion	2	159	January 19, 2025
Filter without rerun the entire script Using Streamlit pandas	4	1613	September 21, 2021

Filter dataframe by selections made in select box

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies