Filter dataframe with st.multiselect based on column that contains array

Hi, I am using Streamlit in Snowflake (v1.22.0):

What I am trying to do is to apply a multiselect widget with unique labels list that will filter the dataframe BUT the labels rows can contain more than 1 value from the list (array of strings/list)
(also could be understood as tags)

The labels column can look like this (as an example):

v10, v11, v12, v13
v11, v12, abc
v13, def
v10, v13, abc

I also tried to swich is to array with df['LABELS'].str.split(',')

For the multiselect widget I chose only the unique values in the column

df_label_unique = ['v10', 'v11', 'v12', 'v13', 'abc', 'def']
select_labels = st.multiselect('Labels ', options=df_label_unique, default=[])

I am having troubles with applying the mutliselect to filter the dataframe and also combine it with other multiselect widgets.

Basically I think I need to compare values in 2 arrays/lists and filter based on the result

What I have tried so far:

df_test1 = df[df['LABELS'].str.contains(select_labels)]

df_test2['LABELS'].apply(lambda x: select_labels[0] in x)
this works but only for the 1st selected element in the widget

and much more…but nothing seems to work

for other multiselects I am using this: (because there is only 1 value per row)

df_filter = df[df['COLUMN'].isin(selected_filter)]

If it helps I know that in Snowflake I would use LIKE ANY :smile:

select labels
from table
where labels like any('%v11.x%', '%v14.x%')

Thank you for any help in advance
Jana

Hi @hrubajan Is it possible to share any working version?

Hi @hrubajan

I’ve assumed that you have a single column called labels that contains a string of tag values (v10, v11, etc.).

So I’ve recreated this using the following:

import pandas as pd

# Create the DataFrame
df = pd.DataFrame({'labels': ['v10, v11, v12, v13', 'v11, v12, abc', 'v13, def', 'v10, v13, abc']})

which generates a DataFrame that looks like the following:

Next, I’ve created a select_labels variable to mimick the multiselect widget:

select_labels = ['v10', 'v11', 'abc']

Then, we check if any value in the “labels” column contains any of the values in select_labels:

mask = df['labels'].str.contains('|'.join(select_labels))

Finally, we subset the dataframe:

subset_df = df[mask]

to get the following which contains values specified in the select_labels variable:

Hope this helps!

2 Likes

Hi @dataprofessor !

Thank you so much this helps a lot :pray: It works great with our dataset.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.