Performing a loop on a column searching for rows that contain only characters in another list

I have a column in a data frame called new_new_column (apologies for the poor name) that consists of lists of strings such as [a, b, i, l] and [a, ɡ, i, l]. Separately, I have a list of characters called “sharedlist” that is a list of characters [‘h’, ‘p’, ‘ɡ’, ‘ʧ’, ‘s’, ‘ʃ’, ‘f’,…]. I want to go through every row of new_new_column and if every character in a given row of new_new_column appears in sharedlist, I want to add the name that appears in the column Name to a new list called NewList. The following code does not work in Streamlit (there is an empty list that is produced). Do you have recommendations on how to do this?

NewList = [df['Name'].iloc[i] for i,nc_val in enumerate(df['new_new_column']) if set(nc_val).issubset(sharedlist)]

Hi @jeylan.erman -

It’s important to note here that Streamlit isn’t part of this code not working, but rather you’ve got faulty base Python logic. I would unpack that list comprehension into smaller parts, until you are sure you’re getting what you want.

Best,
Randy

Thanks Randy.

Hi Randy, thanks for your time today.
Attached I have a csv file. Below is the code that I have used and here is an example of a sharedlist = [‘i’, ‘ə’, ‘ŋ’, ‘s’, ‘h’, ‘ˈ’, ‘aɪ’, ‘l’, ‘u’, ‘k’, ‘ɛ’, ‘x’, ‘ʧ’, ‘p’, ‘j’, ‘m’, ‘d’, ‘v’, ‘ʔ’, ‘ʃ’, ‘ɪ’, ‘n’, ‘ɔ’, ‘ər’, ‘ɡ’, ‘ʤ’, ‘f’, ‘ʒ’, ‘ˌ’, ‘r’, ‘ʊ’, ‘aʊ’, ‘t’, ‘z’, ‘b’].
Csv file

        NewList1 = [clusterprint_L1_g['Name'].iloc[i] for i,nc_val in enumerate(clusterprint_L1_g['new__new_column']) if set(nc_val).issubset(sharedlist)]

Hi @jeylan.erman -

After walking through your problem, it seems like this might be what you are looking for?

NewList = [x if set(x).issubset(sharedlist) else "Not all characters from row are in sharedlist" for x in df['new_new_column']]

The way this comprehension works is the following:

  • Iterate over df['new_new_column']
  • Return the value from the row (“x”) if the distinct values from x (set(x)) is a subset of sharedlist
  • If the value in the row is not a subset of sharedlist, then the string “Not all characters from row are in sharedlist” is placed in the value

From the handful of rows that you sent me, it doesn’t appear any of them are complete subsets of the sharedlist, so I can’t confirm this works. But if you run this on your full dataset, hopefully you’ll see a positive match. If not, please let me know and we can try something else.

Best,
Randy