Facing Problem while using St.dataframe (please help)

Dataset used :- Heart Disease Dataset | Kaggle

There are total “1025 Rows” in which “721 Rows are Duplicate” in dataset.
After Droping the Duplicate Values , total rows showing is “304”.
But when i display the dataframe{dataset) , its shows “878 Rows”

please help guyzz…



import pandas as pd
import streamlit as st
st.title("BEFORE DROPING")
df = pd.read_csv("/workspace/Learning/heart.csv")
st.markdown(f'''Dataset Shape (before) 

Rows :   :green[{df.shape[0]}]

Columns :   :green[{df.shape[1]}]''')
st.markdown(f'''total no of duplicate values(before) 

{df.duplicated().sum()}''')
st.subheader("Dataset before droping Null Values")
st.dataframe(df)

dff = df.drop_duplicates()          # Droping Duplicate Values

st.title("AFTER DROPING")
st.markdown(f'''Dataset Shape (After)

Rows :   :red[{dff.shape[0]}]

Columns :   :green[{dff.shape[1]}]''')
st.markdown(f'''total no of duplicate values(After) : :green[{dff.duplicated().sum()}]''')
st.subheader("Dataset After droping Null Values")
st.dataframe(dff)

Hi @Saksham1515 ,

The number 878 you are seeing is not the number of rows but the index of the row.
i.e. the index of the row before you delete duplicates.

If you want to reset the index use this command.

file = file.reset_index(drop=True)

This will update with latest index

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.