Performing a loop on a column searching for rows that contain only characters in another list

jeylan.erman · October 6, 2020, 1:01am

I have a column in a data frame called new_new_column (apologies for the poor name) that consists of lists of strings such as [a, b, i, l] and [a, ɡ, i, l]. Separately, I have a list of characters called “sharedlist” that is a list of characters [‘h’, ‘p’, ‘ɡ’, ‘ʧ’, ‘s’, ‘ʃ’, ‘f’,…]. I want to go through every row of new_new_column and if every character in a given row of new_new_column appears in sharedlist, I want to add the name that appears in the column Name to a new list called NewList. The following code does not work in Streamlit (there is an empty list that is produced). Do you have recommendations on how to do this?

NewList = [df['Name'].iloc[i] for i,nc_val in enumerate(df['new_new_column']) if set(nc_val).issubset(sharedlist)]

randyzwitch · October 6, 2020, 3:59pm

Hi @jeylan.erman -

It’s important to note here that Streamlit isn’t part of this code not working, but rather you’ve got faulty base Python logic. I would unpack that list comprehension into smaller parts, until you are sure you’re getting what you want.

Best,
Randy

jeylan.erman · October 6, 2020, 4:55pm

Thanks Randy.

jeylan.erman · October 8, 2020, 7:29pm

Hi Randy, thanks for your time today.
Attached I have a csv file. Below is the code that I have used and here is an example of a sharedlist = [‘i’, ‘ə’, ‘ŋ’, ‘s’, ‘h’, ‘ˈ’, ‘aɪ’, ‘l’, ‘u’, ‘k’, ‘ɛ’, ‘x’, ‘ʧ’, ‘p’, ‘j’, ‘m’, ‘d’, ‘v’, ‘ʔ’, ‘ʃ’, ‘ɪ’, ‘n’, ‘ɔ’, ‘ər’, ‘ɡ’, ‘ʤ’, ‘f’, ‘ʒ’, ‘ˌ’, ‘r’, ‘ʊ’, ‘aʊ’, ‘t’, ‘z’, ‘b’].
Csv file

        NewList1 = [clusterprint_L1_g['Name'].iloc[i] for i,nc_val in enumerate(clusterprint_L1_g['new__new_column']) if set(nc_val).issubset(sharedlist)]

randyzwitch · October 15, 2020, 7:46pm

Hi @jeylan.erman -

After walking through your problem, it seems like this might be what you are looking for?

NewList = [x if set(x).issubset(sharedlist) else "Not all characters from row are in sharedlist" for x in df['new_new_column']]

The way this comprehension works is the following:

Iterate over df['new_new_column']
Return the value from the row (“x”) if the distinct values from x (set(x)) is a subset of sharedlist
If the value in the row is not a subset of sharedlist, then the string “Not all characters from row are in sharedlist” is placed in the value

From the handful of rows that you sent me, it doesn’t appear any of them are complete subsets of the sharedlist, so I can’t confirm this works. But if you run this on your full dataset, hopefully you’ll see a positive match. If not, please let me know and we can try something else.

Best,
Randy

Topic		Replies	Views
Dataframe multi filter Using Streamlit	4	3210	February 27, 2023
Filter dataframe based on multiple criteria Using Streamlit	3	1283	July 29, 2021
How to get only one value form multi column list Using Streamlit session-state , streamlit-cloud , discussion	0	24	May 10, 2025
Filtering columns by header name Using Streamlit pandas	2	3593	January 12, 2022
Filter dataframe with st.multiselect based on column that contains array Using Streamlit multiselect , dataframe	4	2417	December 16, 2023

Performing a loop on a column searching for rows that contain only characters in another list

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies