How to filter datafarme with python and streamlit

i have the below dataframe that have one field tech includes string value that are separated with - i want to make this field as tags where once the user select one of these values it display the datafarme grouped by the project_title.

code:

    import streamlit as st
    import pandas as pd
    import numpy as np
    
    data = {
        'project_title': ['LSE', 'DCP', 'Job-detection', 'Task management & Organizer'],
        'tech': ['python-RegExp-PyQt5', 'python-RegExp', 'python-RegExp-BeautifulSoup-pandas', 'python-pandas-MS_SQL-CSS/HTML-Javascript'],
        'Role': ['Junior developer ', 'Python developer', 'Python developer', 'Tech lead']
    }
    
    #split the tech column into multiple columns
    df[['tech1','tech2','tech3','tech4','tech5']]=df['tech'].str.split('-', expand=True)
    
    #create seperated list of each tech column
    tech1 = df["tech1"].unique().tolist()
    tech2 = df["tech2"].unique().tolist()
    tech3 = df["tech3"].unique().tolist()
    tech4 = df["tech4"].unique().tolist()
    tech5 = df["tech5"].unique().tolist()
    
    #concatinate all the lists into one list.
    tech_all = tech1+tech2+tech3+tech4+tech5
    tech_all = list(filter(None, tech_all))
    
    #create multiselect widget that includes the created list of tech
    regular_search_term =tech_all
    choices = st.multiselect(" ",regular_search_term)
    
    #return the dataframe based on the selected values from the multiselect widget.
    df_result_search=df[df.loc[:,"tech1":"tech5"].isin(choices)]
    
    st.write(df_result_search)

The above code did not return what i want as result

Hi @leb_dev , I tried just running this script and it didn’t even quite run. It failed at

df[['tech1','tech2','tech3','tech4','tech5']]=df['tech'].str.split('-', expand=True)

It sounds like you want to put
['python-RegExp-PyQt5', 'python-RegExp', 'python-RegExp-BeautifulSoup-pandas', 'python-pandas-MS_SQL-CSS/HTML-Javascript']
into a multiselect. I’m exactly sure about what

“display the datafarme grouped by the project_title”

means but maybe you can clarify what that means.

Thanks for posting and hope to hear from you soon.

i am trying to make the tech column as a tag for each project_title where the user will select value/s from the multiselect widget and based on the selected value the app display the related records where it includes all the fields.

the line below i am trying to expand the one column tech into multiple columns in order to create a list of it.
df[['tech1','tech2','tech3','tech4','tech5']]=df['tech'].str.split('-', expand=True)

you can use explode instead of expand it seems better way.

df.tech = df.tech.str.split('-')
df = df.explode('tech', ignore_index=True)

the expected result must be like so :
if user select :pandas and RegExp
the dataframe must display


|project_title      |    tech                                   |         Role              |
=================== ==============================================================
| Job-detection     |python-RegExp-BeautifulSoup-pandas         | python developer          |

if user select :python and RegExp
the dataframe must display


|project_title      |    tech                                          |         Role              |
=================== ============================================================================
| Job-detection     |   python-RegExp-BeautifulSoup-pandas             | python developer          |
| LSE               |   python-RegExp-PyQt5                            | Junior developer          |
| DCP               |   python-RegExp                                  | python developer          |

HOPE that i clarify my problem, and thank you.

Hi @leb_dev ,

seems like you want to have
python,RegExp, BeautifulSoup,Pandas,PyQt5, MS_SQL, CSS/HTML, and javascript into a multiselect and then when they select any of those options in the multiselect, you return the df based on options that are selected?

If that’s true, I suggest looping through all of the techs and then splitting by “-” and then adding them to a set (guarantees unique).

However, I am confused how you get the returned dataframes. are those combinations of all of the tech, project titlem and role?

Until now i was able to do this :

st.write(df)
regular_search_term =df.tech.unique().tolist()
choices = st.multiselect(" ",regular_search_term)
st.write(df[df.tech.isin(choices)]

result:

pic1

what i want if its possible to separate the tech in the multiselect widget and return the same result.

Hi @leb_dev ,

what i want if its possible to separate the tech in the multiselect widget and return the same result.
Does this mean you want two multiselect widgets? What do you mean separate the tech in the multiselect widget?

I think I understand your question, I do it a lot for all of my apps
So I created this handy function:
You provide:

  1. A dataframe
  2. A list of columns to create widgets from (if not provided it uses all columns)
  3. Set “allow_single_value_widgets” to True if you want to have widgets for columns that only have one unique value in the table
def filters_widgets(df, columns=None, allow_single_value_widgets=False):
    # Parse the df and get filter widgets based for provided columns
    if not columns: #if columns not provided, use all columns to create widgets
        columns=df.columns.tolist()
    if allow_single_value_widgets:
        threshold=0
    else:
        threshold=1
    widget_dict = {}
    filter_widgets = st.container()
    filter_widgets.warning(
        "After selecting filters press the 'Apply Filters' button at the bottom.")
    if not allow_single_value_widgets:
        filter_widgets.markdown("Only showing columns that contain more than 1 unique value.")
    with filter_widgets.form(key="data_filters"):
        not_showing = [] 
        for y in df[columns]:
            if str(y) in st.session_state: #update value from session state if exists
                selected_opts = st.session_state[str(y)]
            else: #if doesnt exist use all values as defaults
                selected_opts = df[y].unique().tolist()
            if len(df[y].unique().tolist()) > threshold: #checks if above threshold
                widget_dict[y] = st.multiselect(
                    label=str(y),
                    options=df[y].unique().tolist(),
                    default=selected_opts,
                    key=str(y),
                )
            else:#if doesnt pass threshold
                not_showing.append(y)
        if not_showing:#if the list is not empty, show this warning
            st.warning(
                f"Not showing filters for {' '.join(not_showing)} since they only contain one unique value."
            )
        submit_button = st.form_submit_button("Apply Filters")
    #reset button to return all unselected values back
    reset_button = filter_widgets.button(
        "Reset All Filters",
        key="reset_buttons",
        on_click=reset_filter_widgets_to_default,
        args=(df, columns),
    )
    filter_widgets.warning(
        "Dont forget to apply filters by pressing 'Apply Filters' at the bottom."
    )

And also:

def reset_filter_widgets_to_default(df, columns):
    for y in df[columns]:
        if str(y) in st.session_state:
            del st.session_state[y]

with your example:

data = {
        'project_title': ['LSE', 'DCP', 'Job-detection', 'Task management & Organizer'],
        'tech': ['python-RegExp-PyQt5', 'python-RegExp', 'python-RegExp-BeautifulSoup-pandas', 'python-pandas-MS_SQL-CSS/HTML-Javascript'],
        'Role': ['Junior developer ', 'Python developer', 'Python developer', 'Tech lead']
    }
df=pd.DataFrame(data)

filters_widgets(df)

this is how it looks:

you can do the same after splitting the tech column as you did in your code and use this as an input to the function.
Let me know if this helped

2 Likes

Have you seen this? Auto-generate a dataframe filtering UI in Streamlit with filter_dataframe!

1 Like

Thanks for the help @alonsh @asehmi . Really appreciate it. Hopefully this solves leb_dev’s issue.

1 Like

Amazing! I love it!

@alonsh @asehmi @willhuang thanks for all your help.

  1. About the function that @alonsh provided it is very useful but i didn’t know how to display the filtered data.

  2. the Auto-generate a dataframe filtering UI in Streamlit with filter_dataframe!
    can also be used in my case and it is very helpful.

But i did found my own solution using the dependent multiselect widgets .
But still having the duplication problem as it shows below.

import streamlit as st
import pandas as pd
import numpy as np


data = {
        'project_title': ['LSE', 'DCP', 'Job-detection', 'Task management & Organizer'],
        'tech': ['python-RegExp-PyQt5', 'python-RegExp', 'python-RegExp-BeautifulSoup-pandas', 'python-pandas-MS_SQL-CSS/HTML-Javascript'],
        'Role': ['Junior developer ', 'Python developer', 'Python developer', 'Tech lead']
    }

df=pd.DataFrame(data)
df.tech = df.tech.str.split('-')
df = df.explode('tech', ignore_index=True)



if tech := st.multiselect("tech", df.tech.unique().tolist(), key=1):
    df = df[df["tech"].isin(tech)]
project = df.query("tech ==@tech")

if project_title := st.multiselect("project_title", project.project_title.unique().tolist(), key=2):
    df = df[df["project_title"].isin(project_title)]


st.write(df)

so is there a way to fix this issue ?

what do you mean by duplication issue? I dont see any problems…

I meant duplication with the project title
Where if the project has multiple tech it will be displayed multiple times.

How is it possible to display the project_title with all the tech in one record?

For example
Task management will be displayed 1 time for all 3 tech used.

The final result i want the tech values to be considerate as tags.

If this display as dataframe doesn’t work
Can i use bootstrap card in order to display the results ?? And how ??