Issue renaming columns - ValueError: Duplicate column names found

Hi guys,

I have a dataframe that I use to create a similarity matrix, but I can’t change the name of my columns once my matrix is generated. My code runs fine in Python Notebook but for some reason it won’t work when I try to run it in Streamlit. I was able to change manually the name of the first columns, but my dataframe has 4111 columns in total, so I’m trying to find a way to automate the process.

Here’s my code:

#import of my original dataset
dfm = pd.read_csv(“df_matrix.csv”, index_col = ‘artist_track’)
st.dataframe(dfm)

#initiation of column names through a list:
new_index = list(dfm.index)

#creation of maxtrix:
pairwise = pd.DataFrame(squareform(pdist(dfm, ‘mahalanobis’)), columns = new_index)
st.dataframe(pairwise)

I tried changing the index values instead, which worked:
new_index = list(dfm.index)
pairwise = pd.DataFrame(squareform(pdist(dfm, ‘mahalanobis’)), index = new_index)
st.dataframe(pairwise)

But then when I try to transpose the dataframe, I get the same error:
new_index = list(dfm.index)
pairwise = pd.DataFrame(squareform(pdist(dfm, ‘mahalanobis’)), index = new_index).T
st.dataframe(pairwise)

I tried changing the columns names using a dictionary but it didn’t work either:
new_index = list(dfm.index)
col_index = np.arange(0,4112)
index = dict(zip(col_index, new_index))
pairwise = pd.DataFrame(squareform(pdist(dfm, ‘mahalanobis’)))
pairwise = pairwise.rename(columns = index)
st.dataframe(pairwise)

I know I must be missing something somewhere, but I don’t what it is.

Thanks for your help! :slight_smile:

You must have a repeated entry somewhere. In your first example where you have columns=new_index, try inspecting new_index:

  • Compare len(new_index) with len(set(new_index)).
  • Examine [x for x in set(new_index) if new_index.count(x) > 1] to get a list of duplicate items.
  • Or take a look at your dfm dataframe with dfm[dfm.index.duplicated()]

Good idea, I will try that. Thank you!