I have a column in a data frame called new_new_column (apologies for the poor name) that consists of lists of strings such as [a, b, i, l] and [a, Ι‘, i, l]. Separately, I have a list of characters called βsharedlistβ that is a list of characters [βhβ, βpβ, βΙ‘β, βΚ§β, βsβ, βΚβ, βfβ,β¦]. I want to go through every row of new_new_column and if every character in a given row of new_new_column appears in sharedlist, I want to add the name that appears in the column Name to a new list called NewList. The following code does not work in Streamlit (there is an empty list that is produced). Do you have recommendations on how to do this?
NewList = [df['Name'].iloc[i] for i,nc_val in enumerate(df['new_new_column']) if set(nc_val).issubset(sharedlist)]
Hi @jeylan.erman -
Itβs important to note here that Streamlit isnβt part of this code not working, but rather youβve got faulty base Python logic. I would unpack that list comprehension into smaller parts, until you are sure youβre getting what you want.
Best,
Randy
Hi Randy, thanks for your time today.
Attached I have a csv file. Below is the code that I have used and here is an example of a sharedlist = [βiβ, βΙβ, βΕβ, βsβ, βhβ, βΛβ, βaΙͺβ, βlβ, βuβ, βkβ, βΙβ, βxβ, βΚ§β, βpβ, βjβ, βmβ, βdβ, βvβ, βΚβ, βΚβ, βΙͺβ, βnβ, βΙβ, βΙrβ, βΙ‘β, βΚ€β, βfβ, βΚβ, βΛβ, βrβ, βΚβ, βaΚβ, βtβ, βzβ, βbβ].
Csv file
NewList1 = [clusterprint_L1_g['Name'].iloc[i] for i,nc_val in enumerate(clusterprint_L1_g['new__new_column']) if set(nc_val).issubset(sharedlist)]
Hi @jeylan.erman -
After walking through your problem, it seems like this might be what you are looking for?
NewList = [x if set(x).issubset(sharedlist) else "Not all characters from row are in sharedlist" for x in df['new_new_column']]
The way this comprehension works is the following:
- Iterate over
df['new_new_column']
- Return the value from the row (βxβ) if the distinct values from x (
set(x)
) is a subset of sharedlist
- If the value in the row is not a subset of
sharedlist
, then the string βNot all characters from row are in sharedlistβ is placed in the value
From the handful of rows that you sent me, it doesnβt appear any of them are complete subsets of the sharedlist
, so I canβt confirm this works. But if you run this on your full dataset, hopefully youβll see a positive match. If not, please let me know and we can try something else.
Best,
Randy