Please help a confused guy with Streamlit/DataFrames/Filter - Updated

I have received answers to this about using Streamlit with DataFrame and Filter and I continue to struggle greatly. I think I may be too old to learn new things. Maybe there is a generous should out there that can help me get this organized. Been at it for a week.

The goal is here to display a Dataframe that can be filtered and can be updated on screen. My first attempt caused both the current DF and the filtered DF.

Here’s my latest attempt

query = 'SELECT Investor_Id, LastName, Distribution_date, Distribution_Year, ’
'Distribution_quarter, Distribution_Net ,Investment from ’
'Distribution, Investors ’
‘where Investors.InvestorID = Distribution.Investor_Id’

st.title(‘BDK Services Group’)

df = pd.read_sql(query, mydb)

nameList = df[‘LastName’].drop_duplicates()
yearList = df[‘Distribution_Year’].drop_duplicates()
quarterList = df[‘Distribution_quarter’].drop_duplicates()
investmentList = df[‘Investment’].drop_duplicates()

FilterInvestment = st.sidebar.selectbox(“Investment”, investmentList)
FilterLastName = st.sidebar.selectbox(“LastName”, nameList)
FilterYear = st.sidebar.multiselect(“Distribution_Year”,yearList)
FilterQuarter = st.sidebar.multiselect(“Distribution_Quarter”, quarterList)

st.button(“Create PDF”, key=“CreatePDF”)

df.loc[(df[‘Investment’] == FilterInvestment) & (df[‘LastName’] == FilterLastName)] &
(df[‘Distribution_Year’] == FilterYear) & (df[‘Distribution_Year’] == FilterYear)]

st.session_state

mydb.close() # close the connection

It would help if you included the error message or fully described the issue. This is not helpful enough:

My first attempt caused both the current DF and the filtered DF.

Here is an example minimal reproducible code. I just use a sample data.

import streamlit as st
import pandas as pd


# sample data
data = {
    "Investor_Id": [i for i in range(1, 11)],
    "LastName": ["Smith", "Johnson", "Williams", "Jones", "Brown", "Davis",
                 "Miller", "Wilson", "Moore", "Taylor"],
    "Distrib_date": ["2019-01-15", "2019-02-20", "2019-03-25",
                     "2020-04-30", "2020-05-05", "2020-06-10",
                     "2021-07-15", "2021-08-20", "2021-09-25",
                     "2022-10-30"],
    "Distrib_Net": [15000.00, 20000.00, 25000.00, 30000.00, 35000.00, 
                    40000.00, 45000.00, 50000.00, 55000.00, 60000.00],
    "Investment": ["Bonds", "Stocks", "Real Estate", "Commodities", "Bonds",
                   "Stocks", "Real Estate", "Commodities", "Bonds", "Stocks"]
}


# create DataFrame
df = pd.DataFrame(data)

# Convert 'Distribution_date' to datetime format for calculations
df['Distrib_date'] = pd.to_datetime(df['Distrib_date'])

# Calculate 'Year' and 'Quarter' from 'Distribution_date'
df['Year'] = df['Distrib_date'].dt.year
df['Quarter'] = df['Distrib_date'].dt.quarter

st.title('BDK Services Group')
st.dataframe(df)

# unique items of select boxes
nameList = df['LastName'].drop_duplicates()
yearList = df['Year'].drop_duplicates()
quarterList = df['Quarter'].drop_duplicates()
investmentList = df['Investment'].drop_duplicates()

with st.sidebar:
    FilterInvestment = st.selectbox('Investment', investmentList)
    FilterLastName = st.selectbox('LastName', nameList)
    FilterYear = st.multiselect('Year',yearList)
    FilterQuarter = st.multiselect('Quarter', quarterList)

st.write('## Filtered')
filtered_df = df.loc[(df['Investment'] == FilterInvestment) & 
                    (df['LastName'] == FilterLastName) & 
                    (df['Year'].isin(FilterYear)) & 
                    (df['Quarter'].isin(FilterQuarter))]
st.write(filtered_df)

Output

Thanks for the hints, but this isn’t exactly what I was looking for.

  1. The distribution year can’t be taken from the Distribution year, but still needs to be filtered.
  2. I would like the displayed grid to be updated in place.
  3. Ideally I would like the filters for the year and quarter to be only activated when there are selections.

Hi @HarryMelamed

The code snippet from @ferdy is a great implementation of the reproducible app.

In regards to (1) further filtering could be implemented via Pandas and the final dataframe could then be assigned to a variable and displayed in-app.

As for (2), you can use a placeholder via st.empty() such that an updated version of the dataframe can overwrite an existing older version of the dataframe.

Finally, for (3) you can use an if selections statement (where selections could be replaced with the actual widget variable names) to conditionally display the filter widgets upon the above statement being true.

Hope these are helpful!