Trouble with multiselect when filtering dataframe

My app has sliders in the left sidebar that allow the user to narrow down the range of various columns, then defines a filt based on a long string of these sliders &'d together.

My problem is when I attempt to show a multiselect box that has the Journal Nameā€™s which meet that filter. If the user selects a Journal Name that is after a row that was removed by the filter, I get a KeyError.

In this example, the whole dataframe has 7 rows, of which indexes 3, 4, and 6 were filtered out. The multiselect box then shows the four remaining titles as options to choose. If I pick one of the first three, everything is fine. If I pick the fourth option (ā€œTheoretical Computer Scienceā€, corresponding to index 5), I get this error.

I have investigated the .loc operator in pandas and was able to make a working example in a Jupyter Notebook. I think this is traced down to the multiselect widget.

selected_titles = st.multiselect('Journal Name:', pd.Series(df.loc[filt, 'title']), help='Displayed in order provided by the underlying datafile')

error

Try resetting the index of the dataframe so in your case itā€™d go from 0 - 3.

But what happens when the user changes the filter, or adds more requirements? Iā€™d have to reset the index again. And I wouldnā€™t want to modify the actual, clean dataframe so Iā€™d have to make a copy of the df and process that. Then overwrite the copy again when the filters change.

I guess I donā€™t understand why I need to worry about the index at all. Doesnā€™t multiselect return a list, which I can then use to loop over and do other things in the next part of my code?

whatā€™s the statement applying selected_titles?

if st.button('Commit change!'):
                for title in selected_titles:
                    title_filter = (df['title'] == title)
                    df.loc[title_filter, 'subscribed'] = radiovalue

If the user wants to change the subscribed status of the Journal Title (and thus the color coding in the charts), they select the Journal Name from the multiselect, choose an option from a radio button, and hit the Commit change button. If they want to change the status of multiple titles at a time, the loop lets them do that.

If youā€™re using VS Code for development, Iā€™d recommend running the ptvsd remote debugger to step through the code to see how the filters are working. Or do it the hard way and print out values.

Debugging in VS Code

See this article for details: How to use Streamlit with VS Code

Essentially follow these steps:

  1. pip install ptvsd
  2. Add the following snippet in your <your-app_name>.py file.
import ptvsd
ptvsd.enable_attach(address=('localhost', 5678))
ptvsd.wait_for_attach() # Only include this line if you always want to manually attach the debugger
  1. Then start your Streamlit app

streamlit run <your-app_name>.py

  1. From the Debug sidebar menu configure Remote Attach: Attach to a remote ptvsd debug server and update your launch.json file with the details below.
{
    "name": "Python: Remote Attach",
    "type": "python",
    "request": "attach",
    "port": 5678,
    "host": "localhost",
    "justMyCode": true,
    "redirectOutput": true,
    "pathMappings": [
        {
            "localRoot": "${workspaceFolder}",
            "remoteRoot": "."
        }
    ]
}
  1. Make sure you manually insert the redirectOutput setting.
  2. By default you will be debugging your own code only. If you want to debug into streamlit code, then change justMyCode setting from true to false.
  3. Finally, attach the debugger by clicking the debugger play button.
2 Likes

Thanks. Iā€™m using Spyder but I did explore how to use their debugger on your suggestion.

I still think this is an issue with how the mutiselect understands the userā€™s selection. It seems the tool passes the index and then converts to a name, but does not pass the actual name itself.
wrong_journal2
If the dataframe was filtered, the indexes donā€™t line up anymore, and it causes problems like this.

I think I was able to fix it by defining a new dataframe with only the valid titles after the filter, then offering that new dataframe as choices in the multiselect but also turning on reset_index(drop=True).

filtered_titles_df = df.loc[filt]['title']      #make a new df with only the valid titles
selected_titles = st.multiselect('Journal Name:', pd.Series(filtered_titles_df.reset_index(drop=True)))
2 Likes

Glad you sorted it out.