Filter dataframe by selections made in select box

I have a large dataframe (about 48,000 rows) that I want to filter through via user inputs to select boxes. I have my sidebar configured as shown below.

Is it possible in Streamlit to use those user inputs as filters through the dataframe and display the contents of a final dataframe column? I have 5 columns of data in my dataframe (4 of the columns are the inputs shown above). I want to show the last column in my dataframe on the main part of the page after filtering from the chosen inputs.

I always want to show all of the possible entries from the “makes” columns because that’s the initial start of the filtering. But after a selection is made in that column, I would like for the available selections to only be the filtered down “years” of that “make” and so on through each select box.

Also, is there a way to best optimize the loading of this data? Would it be best to load all of the data in the beginning and keep it in the cache or is would it be better to step-by-step load the data as the user makes selections?

Here’s my current, very simple code:

import streamlit as st
import pandas as pd

@st.cache
def get_data():
path = r'cars.csv'
return pd.read_csv(path)
df = get_data()

makes = df['make'].drop_duplicates()
years = df['year']
models = df['model']
engines = df['engine']
components = df['components']
make_choice = st.sidebar.selectbox('Select your vehicle:', makes)
year_choice = st.sidebar.selectbox('', years)
model_choice = st.sidebar.selectbox('', models)
engine_choice = st.sidebar.selectbox('', engines)

st.write('Results:', components)

Hi @srog,

Welcome to the Streamlit Community! You can definitely filter user inputs based on the selections in the select boxes you are using. This is not actually Streamlit functionality but Pandas functionality. For example, you could use:

df.loc[(df['make']=make_choice) & (df['year']=year_choice) & (df['model']= model_choice) & (df[engine]=engine_choice)]

This will display the pandas data frame with all of the user conditions met. However, it will not filter your further choices by the years of that make etc… You can do this by changing the order in how you’re creating your options for your dropdown selections.

import streamlit as st
import pandas as pd

@st.cache
def get_data():
path = r'cars.csv'
return pd.read_csv(path)
df = get_data()

makes = df['make'].drop_duplicates()
make_choice = st.sidebar.selectbox('Select your vehicle:', makes)
years = df["year"].loc[df["make"] = make_choice]
year_choice = st.sidebar.selectbox('', years) 
etc .... 

You can continue this pattern with the other two or filter them based on what the make of the car, like in this example.

In terms of the optimizing the loading of this data, I think it is fine the way you have it. This way a person who changes their choice in a “make” of a car will not have to wait for a whole new data set to load. They just wait for pandas to filter their results (this is very fast).

Hope this helps!
Happy Streamlit-ing!
Marisa

1 Like

Thanks so much! With a little modification, this worked great. I thought there might be some built in Streamlit functionality that might help me filter the data, but using the built in pandas functionality worked great.

1 Like