Dataset used :- Heart Disease Dataset | Kaggle
There are total “1025 Rows” in which “721 Rows are Duplicate” in dataset.
After Droping the Duplicate Values , total rows showing is “304”.
But when i display the dataframe{dataset) , its shows “878 Rows”
please help guyzz…
import pandas as pd
import streamlit as st
st.title("BEFORE DROPING")
df = pd.read_csv("/workspace/Learning/heart.csv")
st.markdown(f'''Dataset Shape (before)
Rows : :green[{df.shape[0]}]
Columns : :green[{df.shape[1]}]''')
st.markdown(f'''total no of duplicate values(before)
{df.duplicated().sum()}''')
st.subheader("Dataset before droping Null Values")
st.dataframe(df)
dff = df.drop_duplicates() # Droping Duplicate Values
st.title("AFTER DROPING")
st.markdown(f'''Dataset Shape (After)
Rows : :red[{dff.shape[0]}]
Columns : :green[{dff.shape[1]}]''')
st.markdown(f'''total no of duplicate values(After) : :green[{dff.duplicated().sum()}]''')
st.subheader("Dataset After droping Null Values")
st.dataframe(dff)