Auto-generate a dataframe filtering UI in Streamlit with filter_dataframe!

Learn how to add a UI to any dataframe

Posted in Tutorial, August 18 2022

Streamlit apps often have some sort of a filtering component where developers write code snippets that combine Streamlit inputs, st.dataframe, or the ever-popular streamlit-aggrid component to create easy-to-use UIs.

We on the Streamlit data science team certainly have followed this trend with the following format:

  1. Use a set of Streamlit native inputs like st.date_range or st.selectbox to get user input.
  2. Use the inputs to filter our data and display it back to the user.

Writing this type of app the first time was fun. But doing it the fifth time—writing different code iterations, figuring out what Streamlit widget fits what data type, configuring it all together—got tiring pretty quickly. To solve this, we built a function filter_dataframe that handles this for us automatically!

In this post, we’ll show you how filter_dataframe works section by section:

Code Section 1. Laying out Streamlit widgets

Code Section 2. Preparing the input dataframe for filtering

Code Section 3. Writing conditionals for different column types

Want to dive right in? Head over to our demo app see it on an example dataframe, and see the full code here.

What is filter_dataframe?

The functionfilter_dataframe lets you:

  • Add a filtering UI to any dataframe
  • Speed up the development time
  • Allow the user to explore a dataset in a self-service way

Here is the code for it:

from pandas.api.types import (
    is_categorical_dtype,
    is_datetime64_any_dtype,
    is_numeric_dtype,
    is_object_dtype,
)
import pandas as pd
import streamlit as st
def filter_dataframe(df: pd.DataFrame) -> pd.DataFrame:
    """
    Adds a UI on top of a dataframe to let viewers filter columns
    Args:
        df (pd.DataFrame): Original dataframe
    Returns:
        pd.DataFrame: Filtered dataframe
    """
    modify = st.checkbox("Add filters")
    if not modify:
        return df
    df = df.copy()
    # Try to convert datetimes into a standard format (datetime, no timezone)
    for col in df.columns:
        if is_object_dtype(df[col]):
            try:
                df[col] = pd.to_datetime(df[col])
            except Exception:
                pass
        if is_datetime64_any_dtype(df[col]):
            df[col] = df[col].dt.tz_localize(None)
    modification_container = st.container()
    with modification_container:
        to_filter_columns = st.multiselect("Filter dataframe on", df.columns)
        for column in to_filter_columns:
            left, right = st.columns((1, 20))
            # Treat columns with < 10 unique values as categorical
            if is_categorical_dtype(df[column]) or df[column].nunique() < 10:
                user_cat_input = right.multiselect(
                    f"Values for {column}",
                    df[column].unique(),
                    default=list(df[column].unique()),
                )
                df = df[df[column].isin(user_cat_input)]
            elif is_numeric_dtype(df[column]):
                _min = float(df[column].min())
                _max = float(df[column].max())
                step = (_max - _min) / 100
                user_num_input = right.slider(
                    f"Values for {column}",
                    min_value=_min,
                    max_value=_max,
                    value=(_min, _max),
                    step=step,
                )
                df = df[df[column].between(*user_num_input)]
            elif is_datetime64_any_dtype(df[column]):
                user_date_input = right.date_input(
                    f"Values for {column}",
                    value=(
                        df[column].min(),
                        df[column].max(),
                    ),
                )
                if len(user_date_input) == 2:
                    user_date_input = tuple(map(pd.to_datetime, user_date_input))
                    start_date, end_date = user_date_input
                    df = df.loc[df[column].between(start_date, end_date)]
            else:
                user_text_input = right.text_input(
                    f"Substring or regex in {column}",
                )
                if user_text_input:
                    df = df[df[column].astype(str).str.contains(user_text_input)]
    return df

Now let’s take a look at how it works!

The filter_dataframe function inputs and outputs the same thing—a pandas dataframe. Within the function, we first ask the user if they’d like to filter the dataframe with a checkbox called modify.

We also added comments and type hints to the top of the function to make the code more digestible:

def filter_dataframe(df: pd.DataFrame) -> pd.DataFrame:
    """
    Adds a UI on top of a dataframe to let viewers filter columns
    Args:
        df (pd.DataFrame): Original dataframe
    Returns:
        pd.DataFrame: Filtered dataframe
    """
    modify = st.checkbox("Add filters")
    if not modify:
        return df 

Code Section 2. Preparing the input dataframe for filtering

There are several steps you need to take to prep your dataframe for your app. For the first three you need to:

  • Make a copy of the pandas dataframe so the user input will not change the underlying data.
  • Attempt to cast string columns into datetimes with pd.to_datetime().
  • Localize your datetime columns with .tz_localize(). The Streamlit date picker (which you’ll use later!) returns dates without a timezone, so you need to take this step to compare the two:
df = df.copy()
# Try to convert datetimes into a standard format (datetime, no timezone)
for col in df.columns:
    if is_object_dtype(df[col]):
    	try:
    	    df[col] = pd.to_datetime(df[col])
    	except Exception:
    	    pass
    if is_datetime64_any_dtype(df[col]):
        df[col] = df[col].dt.tz_localize(None)

Now that your data is in a better format, you need to:

  • Set up a container with st.container for your filtering widgets.
  • Use st.multiselect to let the user select the columns:
modification_container = st.container()
with modification_container:
    to_filter_columns = st.multiselect("Filter dataframe on", df.columns)
  • Loop through each column and deal with each one depending on its type. You’ll write conditionals for each one next!
  • Add indentation and an arrow to improve the aesthetics when users select many columns.
for column in to_filter_columns:
    left, right = st.columns((1, 20))
    left.write("↳")

All your data is in the right format. You ensured that your original dataset will remain untouched, and you have prepared your loop to go through all your columns. Now comes the fun part!

Code Section 3. Writing conditionals for different column types

In this function, you’ll want to check for three pandas data types—categorical, numeric, and datetime—then handle the rest as if they’re strings. This is an assumption that works well for us. Your situation might be different, so feel free to add your own conditionals to this list.

For each one create a Streamlit widget that matches your type, then filter your data based on that widget. At the end of this loop, you’ll have to return the entire filtered dataframe.

Let’s take a look at them one by one.

Categorical types

Check for categorical types with the is_categorical_dtype function. Often users don’t cast their data into this type, so assume that anything with fewer than 10 unique values acts like a categorical dtype. As a bonus, it’ll work great with boolean columns (which only have True or False values!).

Now, create a multiselect widget with possible values and use it to filter your dataframe:

# Treat columns with < 10 unique values as categorical
if is_categorical_dtype(df[column]) or df[column].nunique() < 10:
    user_cat_input = right.multiselect(
        f"Values for {column}",
        df[column].unique(),
        default=list(df[column].unique()),
    )
    df = df[df[column].isin(user_cat_input)]

Numeric types

Numeric types are fairly straightforward. You can get the minimum and the maximum from the dataset itself, then assume that the step function is 1% of the range and filter the data accordingly:

elif is_numeric_dtype(df[column]):
    _min = float(df[column].min())
    _max = float(df[column].max())
    step = (_max - _min) / 100
    user_num_input = right.slider(
      f"Values for {column}",
      min_value=_min,
      max_value=_max,
      value=(_min, _max),
      step=step,
    )
    df = df[df[column].between(*user_num_input)]

Datetime types

The datetime dtype is almost the same as the other two. You get the user input with the st.date_input function. Once the user enters two dates, you can filter your dataset:

elif is_datetime64_any_dtype(df[column]):
    user_date_input = right.date_input(
        f"Values for {column}",
        value=(
            df[column].min(),
            df[column].max(),
        ),
    )
    if len(user_date_input) == 2:
        user_date_input = tuple(map(pd.to_datetime, user_date_input))
        start_date, end_date = user_date_input
        df = df.loc[df[column].between(start_date, end_date)]

Other types

We like to convert other dtypes into a string, then let the user search within them for substrings. It might not work for your use case, but for us, it works quite well:

else:
    user_text_input = right.text_input(
        f"Substring or regex in {column}",
    )
    if user_text_input:
        df = df[df[column].astype(str).str.contains(user_text_input)]

Bringing it all together

Want to see what the code looks like in action? Go ahead and test it on the palmerpenguins dataset (see this GitHub repo for the data) or on your own data!

We’ve made an example app using the code (check it out below):

import pandas as pd
import streamlit as st
import streamlit.components.v1 as components
from pandas.api.types import (
    is_categorical_dtype,
    is_datetime64_any_dtype,
    is_numeric_dtype,
    is_object_dtype,
)
st.title("Auto Filter Dataframes in Streamlit")
st.write(
    """This app accomodates the blog [here](<https://blog.streamlit.io/auto-generate-a-dataframe-filtering-ui-in-streamlit-with-filter_dataframe/>)
    and walks you through one example of how the Streamlit
    Data Science Team builds add-on functions to Streamlit.
    """
)
def filter_dataframe(df: pd.DataFrame) -> pd.DataFrame:
    <insert the code we wrote here>
data_url = "https://raw.githubusercontent.com/mcnakhaee/palmerpenguins/master/palmerpenguins/data/penguins.csv"
df = pd.read_csv(data_url)
st.dataframe(filter_dataframe(df))

This code will produce the following app:

Wrapping up

You did it! Now you know how to set up your own filter_dataframe function.

If you have any questions or improvements, please drop them in the comments below or make a suggestion on our GitHub repository for this post.

Happy Streamlit-ing! 🎈


This is a companion discussion topic for the original entry at https://blog.streamlit.io/auto-generate-a-dataframe-filtering-ui-in-streamlit-with-filter_dataframe/
12 Likes

Muito bom o tĂłpico. ParabĂ©ns. VocĂȘ tem algum tutorial onde temos que mostrar mais de uma base? Duas, trĂȘs, etc


This is fantastic. Really well documented, to the point, and clear to understand. 10/10

In this example, the dataframe is filtered by variables in the order in which they appear in the code. This means errors will be returned if one of the filters reduces the dataframe such that the next filter cannot process the dataframe. Can you please tell me how this could be avoided? Is there any way to ensure that all filters are always mutually compatible?

First of all - hey to everybody - my name is Tom and I’m new on the forum.
Just try my first Streamlit app and i’m stunning how good and simple it work’s. Great job!

I implemented this filter beside existing Filters and it works, but i got problems to use the outcome later on.
For example. I’ve got a dataframe “result” and create by a st.sidebar.multiselect a filter.
like

result_selection = result.query(
        "Kunde == @kunde & Reifen_Artikelnr == @Reifen_sel"
    )

I use this “result_selection” later on for a couple of matplotlib Boxplots and histogramms.
So with my filter everything works fine.

But i can’t figure out how to reuse the outcome of the return df of the filter_Dataframe function.

result_selection=st.dataframe(filter_dataframe(result))

These “result_selection” could not be used in the same way - any clue?
Thanks a lot for your Ideas on this issue.
Tom

Great solution. But does this work also for st.data_editor?
I want to filter data in a huge list (but I also want to modify them). I want to set a filter, modify some attributes in the columns and then set a next filter. The modification shall be represented in a resulting df.

Hi all! :wave:

Looking to implement nested columns in a Streamlit dataframe. Any tips or pointers?

Hey @teq508 have you found a solution?

Hi @saasyp,

yes. Please follow these post:

Together with @mathcatsand we found a proper solution.
But I read in GitHub that also many others are interested in an integrated solution and as I guess, there is already something in planning.

1 Like

Hi all,

thank you very much for this really nice code. I implemented it in my data analysis app. There I want to use data filter and column filter. Currently I have one issue which I can’t get rid of:
Lets say I select columns date and product. With date I return a slider and selection is working greatly. With product a multiselect box is opening defaulting to all products in dataframe.
Although, (de-)selection of products is possible, the default returns everytime I apply any change to other filters, like the date slider.

Do you have any recommendation?

thanks for this tutorial. I used it in my app where I render results as a dataframe from a server. But have a problem every time I select a feature to filter on, the whole page rerun and I lose the dataframe. Any ideas on how to solve that?

Hi great work! big fan, I’ve made something similar but not as good as yours ahah :stuck_out_tongue_closed_eyes:
And I was trying to develop a feature that is not present in your but maybe you already thought about it.
With your function, how would you manage the multiple usage of your function without loosing the state ?
I explain:
The user can use your function, and display the dataframe as you do, its works perfectly
The idea is to add a “+” button to use your function a 2nd time for doing another filter and not loosing the 1st one.
A little bit like adding another experience on LinkedIn

I have tried many ways using the cache, managing key argument of my widgets dynamically but I always get problems like:

  • The first filter object reset when I use the second one
  • The key argument updating at each click on the widget instead of having one unique key for a a object

Big thanks for your job :slight_smile:

Is the copy of the original dataframe required ? I commneted out the line and it has no apparent effect.

i asked claude to add, default columns, and columns per row, remove the add filter, hope you like it:


from pandas.api.types import (
    is_categorical_dtype,
    is_datetime64_any_dtype,
    is_numeric_dtype,
    is_object_dtype,
)
import pandas as pd
import streamlit as st
import re


def filter_dataframe(df: pd.DataFrame, default_columns: list = None, columns_per_row: int = 3) -> pd.DataFrame:
    """
    Adds a UI on top of a dataframe to let viewers filter columns with better layout.
    For text/object columns, users can choose between multiselect or text search.

    Args:
        df (pd.DataFrame): Original dataframe
        default_columns (list): List of columns to show filters for by default
        columns_per_row (int): Number of filter columns to display per row

    Returns:
        pd.DataFrame: Filtered dataframe
    """
    
    # Initialize session state for filters
    if 'dataframe_filters' not in st.session_state:
        st.session_state.dataframe_filters = {}
    
    # Initialize session state for filter types (multiselect vs text search)
    if 'filter_types' not in st.session_state:
        st.session_state.filter_types = {}
    
    modify = st.checkbox("Add filters")

    if not modify:
        return df

    df = df.copy()

    # Try to convert datetimes into a standard format (datetime, no timezone)
    for col in df.columns:
        if is_object_dtype(df[col]):
            try:
                df[col] = pd.to_datetime(df[col])
            except Exception:
                pass

        if is_datetime64_any_dtype(df[col]):
            df[col] = df[col].dt.tz_localize(None)

    modification_container = st.container()

    with modification_container:
        # Column selection with default columns pre-selected
        default_selection = default_columns if default_columns else []
        to_filter_columns = st.multiselect(
            "Select columns to filter", 
            df.columns,
            default=[col for col in default_selection if col in df.columns]
        )
        
        # Reset filters button
        if st.button("Reset All Filters"):
            st.session_state.dataframe_filters = {}
            st.session_state.filter_types = {}
            st.rerun()
        
        if to_filter_columns:
            st.markdown("---")
            st.subheader("🔍 Active Filters")
            
            # Apply existing filters to get current filtered dataframe
            filtered_df = df.copy()
            for filter_col, filter_values in st.session_state.dataframe_filters.items():
                if filter_col in df.columns and filter_values:
                    if isinstance(filter_values, dict):
                        # Handle different filter types
                        if filter_values.get('type') == 'numeric_range':
                            min_val, max_val = filter_values['values']
                            filtered_df = filtered_df[
                                (filtered_df[filter_col] >= min_val) & 
                                (filtered_df[filter_col] <= max_val)
                            ]
                        elif filter_values.get('type') == 'date_range':
                            start_date, end_date = filter_values['values']
                            filtered_df = filtered_df[
                                (filtered_df[filter_col] >= start_date) & 
                                (filtered_df[filter_col] <= end_date)
                            ]
                        elif filter_values.get('type') == 'text_regex':
                            pattern = filter_values['values']
                            try:
                                mask = filtered_df[filter_col].astype(str).str.contains(
                                    pattern, regex=True, case=False, na=False
                                )
                                filtered_df = filtered_df[mask]
                            except re.error:
                                st.error(f"Invalid regex pattern for {filter_col}: {pattern}")
                        elif filter_values.get('type') == 'text_contains':
                            text = filter_values['values']
                            mask = filtered_df[filter_col].astype(str).str.contains(
                                text, case=False, na=False, regex=False
                            )
                            filtered_df = filtered_df[mask]
                    elif isinstance(filter_values, list) and filter_values:
                        # Categorical/multiselect
                        filtered_df = filtered_df[filtered_df[filter_col].isin(filter_values)]
            
            # Group filters into rows based on columns_per_row
            for i in range(0, len(to_filter_columns), columns_per_row):
                row_columns = to_filter_columns[i:i + columns_per_row]
                cols = st.columns(len(row_columns))
                
                for j, column in enumerate(row_columns):
                    with cols[j]:
                        st.markdown(f"**{column}**")
                        
                        # Get available options based on current filters (excluding this column)
                        temp_filtered_df = df.copy()
                        for filter_col, filter_values in st.session_state.dataframe_filters.items():
                            if filter_col != column and filter_col in df.columns and filter_values:
                                if isinstance(filter_values, dict):
                                    if filter_values.get('type') == 'numeric_range':
                                        min_val, max_val = filter_values['values']
                                        temp_filtered_df = temp_filtered_df[
                                            (temp_filtered_df[filter_col] >= min_val) & 
                                            (temp_filtered_df[filter_col] <= max_val)
                                        ]
                                    elif filter_values.get('type') == 'date_range':
                                        start_date, end_date = filter_values['values']
                                        temp_filtered_df = temp_filtered_df[
                                            (temp_filtered_df[filter_col] >= start_date) & 
                                            (temp_filtered_df[filter_col] <= end_date)
                                        ]
                                    elif filter_values.get('type') in ['text_regex', 'text_contains']:
                                        pattern = filter_values['values']
                                        try:
                                            if filter_values.get('type') == 'text_regex':
                                                mask = temp_filtered_df[filter_col].astype(str).str.contains(
                                                    pattern, regex=True, case=False, na=False
                                                )
                                            else:
                                                mask = temp_filtered_df[filter_col].astype(str).str.contains(
                                                    pattern, case=False, na=False, regex=False
                                                )
                                            temp_filtered_df = temp_filtered_df[mask]
                                        except:
                                            pass
                                elif isinstance(filter_values, list) and filter_values:
                                    temp_filtered_df = temp_filtered_df[temp_filtered_df[filter_col].isin(filter_values)]
                        
                        # Handle different data types
                        if is_categorical_dtype(df[column]) or df[column].nunique() < 10:
                            # Small number of unique values - always use multiselect
                            available_options = sorted(temp_filtered_df[column].dropna().unique())
                            current_selection = st.session_state.dataframe_filters.get(column, [])
                            
                            # Only keep valid selections
                            valid_current = [x for x in current_selection if x in available_options] if isinstance(current_selection, list) else []
                            
                            selected_values = st.multiselect(
                                "Select values:",
                                available_options,
                                default=valid_current,
                                key=f"cat_{column}_{i}_{j}"
                            )
                            
                            # Only update session state if selection is made
                            if selected_values != current_selection:
                                if selected_values:
                                    st.session_state.dataframe_filters[column] = selected_values
                                else:
                                    # Remove filter if no selection
                                    st.session_state.dataframe_filters.pop(column, None)
                                st.rerun()
                            
                        elif is_numeric_dtype(df[column]):
                            # Numeric slider
                            _min = float(temp_filtered_df[column].min())
                            _max = float(temp_filtered_df[column].max())
                            
                            if _min == _max:
                                st.info(f"All values are {_min}")
                            else:
                                step = (_max - _min) / 100
                                current_filter = st.session_state.dataframe_filters.get(column, {})
                                
                                # Default to full range if no current filter
                                if isinstance(current_filter, dict) and current_filter.get('type') == 'numeric_range':
                                    current_range = current_filter['values']
                                    # Ensure current range is within new bounds
                                    default_value = (
                                        max(_min, current_range[0]),
                                        min(_max, current_range[1])
                                    )
                                else:
                                    default_value = (_min, _max)
                                
                                selected_range = st.slider(
                                    "Value range:",
                                    min_value=_min,
                                    max_value=_max,
                                    value=default_value,
                                    step=step,
                                    key=f"num_{column}_{i}_{j}"
                                )
                                
                                # Only apply filter if range is not the full range
                                if selected_range != (_min, _max):
                                    new_filter = {'type': 'numeric_range', 'values': selected_range}
                                    if st.session_state.dataframe_filters.get(column) != new_filter:
                                        st.session_state.dataframe_filters[column] = new_filter
                                        st.rerun()
                                else:
                                    # Remove filter if full range selected
                                    if column in st.session_state.dataframe_filters:
                                        st.session_state.dataframe_filters.pop(column, None)
                                        st.rerun()
                            
                        elif is_datetime64_any_dtype(df[column]):
                            # Date range filter
                            min_date = temp_filtered_df[column].min().date()
                            max_date = temp_filtered_df[column].max().date()
                            
                            current_filter = st.session_state.dataframe_filters.get(column, {})
                            if isinstance(current_filter, dict) and current_filter.get('type') == 'date_range':
                                current_dates = current_filter['values']
                                default_dates = (
                                    max(min_date, current_dates[0].date() if hasattr(current_dates[0], 'date') else current_dates[0]),
                                    min(max_date, current_dates[1].date() if hasattr(current_dates[1], 'date') else current_dates[1])
                                )
                            else:
                                default_dates = (min_date, max_date)
                            
                            selected_dates = st.date_input(
                                "Date range:",
                                value=default_dates,
                                key=f"date_{column}_{i}_{j}"
                            )
                            
                            if len(selected_dates) == 2:
                                start_date, end_date = map(pd.to_datetime, selected_dates)
                                # Only apply filter if not the full range
                                if (start_date.date(), end_date.date()) != (min_date, max_date):
                                    new_filter = {'type': 'date_range', 'values': (start_date, end_date)}
                                    if st.session_state.dataframe_filters.get(column) != new_filter:
                                        st.session_state.dataframe_filters[column] = new_filter
                                        st.rerun()
                                else:
                                    # Remove filter if full range selected
                                    if column in st.session_state.dataframe_filters:
                                        st.session_state.dataframe_filters.pop(column, None)
                                        st.rerun()
                                
                        else:
                            # Text/Object columns - choice between multiselect and text search
                            unique_count = df[column].nunique()
                            
                            # Get current filter type preference (default to multiselect)
                            current_filter_type = st.session_state.filter_types.get(column, "multiselect")
                            
                            # Filter type selection
                            filter_method = st.radio(
                                "Filter method:",
                                ["Multiselect", "Text search"],
                                index=0 if current_filter_type == "multiselect" else 1,
                                key=f"method_{column}_{i}_{j}",
                                horizontal=True
                            )
                            
                            # Update filter type in session state
                            new_filter_type = "multiselect" if filter_method == "Multiselect" else "text_search"
                            if st.session_state.filter_types.get(column) != new_filter_type:
                                st.session_state.filter_types[column] = new_filter_type
                                # Clear existing filter when switching methods
                                st.session_state.dataframe_filters.pop(column, None)
                                st.rerun()
                            
                            if filter_method == "Multiselect":
                                # Multiselect approach
                                available_options = sorted(temp_filtered_df[column].dropna().unique())
                                current_selection = st.session_state.dataframe_filters.get(column, [])
                                
                                # Only keep valid selections
                                valid_current = [x for x in current_selection if x in available_options] if isinstance(current_selection, list) else []
                                
                                selected_values = st.multiselect(
                                    f"Select values ({len(available_options)} available):",
                                    available_options,
                                    default=valid_current,
                                    key=f"multi_{column}_{i}_{j}"
                                )
                                
                                if selected_values != current_selection:
                                    if selected_values:
                                        st.session_state.dataframe_filters[column] = selected_values
                                    else:
                                        st.session_state.dataframe_filters.pop(column, None)
                                    st.rerun()
                                    
                            else:
                                # Text search approach
                                filter_type = st.radio(
                                    "Filter type:",
                                    ["Contains text", "Regex pattern"],
                                    key=f"text_type_{column}_{i}_{j}",
                                    horizontal=True
                                )
                                
                                current_filter = st.session_state.dataframe_filters.get(column, {})
                                current_text = ""
                                if isinstance(current_filter, dict):
                                    if ((filter_type == "Contains text" and current_filter.get('type') == 'text_contains') or
                                        (filter_type == "Regex pattern" and current_filter.get('type') == 'text_regex')):
                                        current_text = current_filter.get('values', '')
                                
                                user_text_input = st.text_input(
                                    f"Enter {filter_type.lower()}:",
                                    value=current_text,
                                    key=f"text_{column}_{i}_{j}"
                                )
                                
                                if user_text_input:
                                    filter_dict = {
                                        'type': 'text_regex' if filter_type == "Regex pattern" else 'text_contains',
                                        'values': user_text_input
                                    }
                                    if st.session_state.dataframe_filters.get(column) != filter_dict:
                                        st.session_state.dataframe_filters[column] = filter_dict
                                        st.rerun()
                                else:
                                    # Remove filter if no text input
                                    if column in st.session_state.dataframe_filters:
                                        st.session_state.dataframe_filters.pop(column, None)
                                        st.rerun()
            
            # Show filtering results
            if st.session_state.dataframe_filters:
                st.markdown("---")
                original_shape = df.shape
                filtered_shape = filtered_df.shape
                
                col1, col2, col3 = st.columns(3)
                with col1:
                    st.metric("Original rows", original_shape[0])
                with col2:
                    st.metric("Filtered rows", filtered_shape[0])
                with col3:
                    percentage = (filtered_shape[0] / original_shape[0] * 100) if original_shape[0] > 0 else 0
                    st.metric("Retention", f"{percentage:.1f}%")
                
                # Show active filters summary
                st.write("**Active filters:**")
                for filter_col, filter_val in st.session_state.dataframe_filters.items():
                    if isinstance(filter_val, dict):
                        if filter_val.get('type') == 'numeric_range':
                            st.write(f"- {filter_col}: {filter_val['values'][0]:.2f} to {filter_val['values'][1]:.2f}")
                        elif filter_val.get('type') == 'date_range':
                            st.write(f"- {filter_col}: {filter_val['values'][0].date()} to {filter_val['values'][1].date()}")
                        elif filter_val.get('type') in ['text_regex', 'text_contains']:
                            filter_type_name = "Regex" if filter_val.get('type') == 'text_regex' else "Contains"
                            st.write(f"- {filter_col}: {filter_type_name} '{filter_val['values']}'")
                    elif isinstance(filter_val, list):
                        st.write(f"- {filter_col}: {len(filter_val)} selected values")
                
                return filtered_df
    
    return df