Trouble with multiselect when filtering dataframe

eschares · May 5, 2021, 8:32pm

My app has sliders in the left sidebar that allow the user to narrow down the range of various columns, then defines a filt based on a long string of these sliders &'d together.

My problem is when I attempt to show a multiselect box that has the Journal Name’s which meet that filter. If the user selects a Journal Name that is after a row that was removed by the filter, I get a KeyError.

In this example, the whole dataframe has 7 rows, of which indexes 3, 4, and 6 were filtered out. The multiselect box then shows the four remaining titles as options to choose. If I pick one of the first three, everything is fine. If I pick the fourth option (“Theoretical Computer Science”, corresponding to index 5), I get this error.

I have investigated the .loc operator in pandas and was able to make a working example in a Jupyter Notebook. I think this is traced down to the multiselect widget.

selected_titles = st.multiselect('Journal Name:', pd.Series(df.loc[filt, 'title']), help='Displayed in order provided by the underlying datafile')

error

asehmi · May 5, 2021, 9:21pm

Try resetting the index of the dataframe so in your case it’d go from 0 - 3.

eschares · May 5, 2021, 9:26pm

But what happens when the user changes the filter, or adds more requirements? I’d have to reset the index again. And I wouldn’t want to modify the actual, clean dataframe so I’d have to make a copy of the df and process that. Then overwrite the copy again when the filters change.

I guess I don’t understand why I need to worry about the index at all. Doesn’t multiselect return a list, which I can then use to loop over and do other things in the next part of my code?

asehmi · May 5, 2021, 9:48pm

what’s the statement applying selected_titles?

eschares · May 5, 2021, 9:53pm

if st.button('Commit change!'):
                for title in selected_titles:
                    title_filter = (df['title'] == title)
                    df.loc[title_filter, 'subscribed'] = radiovalue

If the user wants to change the subscribed status of the Journal Title (and thus the color coding in the charts), they select the Journal Name from the multiselect, choose an option from a radio button, and hit the Commit change button. If they want to change the status of multiple titles at a time, the loop lets them do that.

asehmi · May 5, 2021, 10:20pm

If you’re using VS Code for development, I’d recommend running the ptvsd remote debugger to step through the code to see how the filters are working. Or do it the hard way and print out values.

Debugging in VS Code

See this article for details: How to use Streamlit with VS Code

Essentially follow these steps:

pip install ptvsd
Add the following snippet in your <your-app_name>.py file.

import ptvsd
ptvsd.enable_attach(address=('localhost', 5678))
ptvsd.wait_for_attach() # Only include this line if you always want to manually attach the debugger

Then start your Streamlit app

streamlit run <your-app_name>.py

From the Debug sidebar menu configure Remote Attach: Attach to a remote ptvsd debug server and update your launch.json file with the details below.

{
    "name": "Python: Remote Attach",
    "type": "python",
    "request": "attach",
    "port": 5678,
    "host": "localhost",
    "justMyCode": true,
    "redirectOutput": true,
    "pathMappings": [
        {
            "localRoot": "${workspaceFolder}",
            "remoteRoot": "."
        }
    ]
}

Make sure you manually insert the redirectOutput setting.
By default you will be debugging your own code only. If you want to debug into streamlit code, then change justMyCode setting from true to false.
Finally, attach the debugger by clicking the debugger play button.

eschares · May 7, 2021, 2:19am

Thanks. I’m using Spyder but I did explore how to use their debugger on your suggestion.

I still think this is an issue with how the mutiselect understands the user’s selection. It seems the tool passes the index and then converts to a name, but does not pass the actual name itself.
wrong_journal2
If the dataframe was filtered, the indexes don’t line up anymore, and it causes problems like this.

I think I was able to fix it by defining a new dataframe with only the valid titles after the filter, then offering that new dataframe as choices in the multiselect but also turning on reset_index(drop=True).

filtered_titles_df = df.loc[filt]['title']      #make a new df with only the valid titles
selected_titles = st.multiselect('Journal Name:', pd.Series(filtered_titles_df.reset_index(drop=True)))

asehmi · May 7, 2021, 7:59pm

Glad you sorted it out.

Topic		Replies	Views
Filter st.multiselect doesn't show dataframe Using Streamlit	2	288	December 21, 2023
Multi Select is Not Filtering DataFrame Using Streamlit pandas	3	5056	September 8, 2021
Streamlit Dataframe Filtering Using Streamlit	5	2884	February 2, 2023
Multiselect button crashes app Using Streamlit	9	876	August 15, 2022
Linking multiselect boxes with session_state and using dictionaries to subset dataframe. Returning empty df! Using Streamlit multiselect	1	628	November 14, 2023

Trouble with multiselect when filtering dataframe

Debugging in VS Code

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies