Selecting multiple rows in st.data_editor after applying filters

I’m working on an app where I’m allowing the user to filter and sort a dataframe and select multiple rows. The sorting will happen regularly so the native row selection on st.dataframe is not going to be useful, since it wipes out the selections after any sort on columns.

I came up with a solution to add a “Select” column to an st.data_editor which preserves the selection on sorts, but still wipes them out when the dataframe is filtered.

@arnaud I stumbled upon this post where you had proposed the same exact solution to allow row selection on an editable dataframe:

Is there a way to add filters in a way that it preserves previous selections?
I’ve tried using session_state to “save” the selection and then apply those to the filtered dataframe, but it acts wonky and always lags by one interaction with the dataframe. Same thing with using an external file to save the state of the dataframe.

I also tried using a submit form to again “save” the selected rows and apply them to the filtered data but didn’t have any success there either.

I’ve updated the original code posted in the link above with a simple filter

import streamlit as st
import numpy as np
import pandas as pd

df = pd.DataFrame(
    {
        "Animal": ["Lion", "Elephant", "Giraffe", "Monkey", "Zebra"],
        "Habitat": ["Savanna", "Forest", "Savanna", "Forest", "Savanna"],
        "Lifespan (years)": [15, 60, 25, 20, 25],
        "Average weight (kg)": [190, 5000, 800, 10, 350],
    }
)

# I'm defining and applying a simple filter here as an example
life_span_range = st.sidebar.slider("Select Lifespan range", 15, 60, (15, 60))
df = df[df["Lifespan (years)"].between(*life_span_range)]

def dataframe_with_selections(df: pd.DataFrame, init_value: bool = False) -> pd.DataFrame:
    df_with_selections = df.copy()
    df_with_selections.insert(0, "Select", init_value)

    # Get dataframe row-selections from user with st.data_editor
    edited_df = st.data_editor(
        df_with_selections,
        hide_index=True,
        column_config={"Select": st.column_config.CheckboxColumn(required=True)},
        disabled=df.columns,
    )

    # Filter the dataframe using the temporary column, then drop the column
    selected_rows = edited_df[edited_df.Select]
    return selected_rows.drop('Select', axis=1)


selection = dataframe_with_selections(df)
st.write("Your selection:")
st.write(selection)

data_selection

Any help or insights would be greatly appreciated!

3 Likes

Hello
I find a solution using the session state :slight_smile:

       df = pd.DataFrame(
            {
                "Animal": ["Lion", "Elephant", "Giraffe", "Monkey", "Zebra"],
                "Habitat": ["Savanna", "Forest", "Savanna", "Forest", "Savanna"],
                "Lifespan (years)": [15, 60, 25, 20, 25],
                "Average weight (kg)": [190, 5000, 800, 10, 350],
            }
        )

        # I'm defining and applying a simple filter here as an example
        life_span_range = st.sidebar.slider("Select Lifespan range", 15, 60, (15, 60))

        def dataframe_with_selections(df: pd.DataFrame, init_value: bool = False) -> pd.DataFrame:
            df_with_selections = df.copy()
            
            # Retrieve information from user. If None, return False (unchecked)
            init_value = st.session_state.get('Select', False)
            # Add Select to the dataframe
            df_with_selections['Select'] = init_value

            # Get dataframe row-selections from user with st.data_editor
            edited_df = st.data_editor(
                df_with_selections,
                hide_index=True,
                column_config={"Select": st.column_config.CheckboxColumn(required=True)},
                disabled=df.columns,
            )
            # Add the select column in the session state. 
            st.session_state['Select'] = list(edited_df.Select)

            # Make modification on the dataframe (lifespan)
            edited_df = edited_df[edited_df["Lifespan (years)"].between(*life_span_range)]

            # Filter the dataframe using the temporary column, then drop the column
            selected_rows = edited_df[edited_df.Select]
            st.write(list(edited_df.Select))

            return selected_rows.drop('Select', axis=1)

        selection = dataframe_with_selections(df)
        st.write("Your selection:")
        st.write(selection)

I hope it’s working for you :slight_smile:

Thank you @Faltawer for taking the time to look into this and provide a solution. This is great!

As I mentioned in the post, it’s actually one of the solutions I had tried and it almost works, but it acts a bit wonky and the selection seems to lag behind, so I have to click twice to select or de-select.

data_selection-session_state

I’m not sure why this happens, but my guess is that the script reruns “on-click” instead of “after-click” so whatever selection is made on the click doesn’t get returned on the first try.

I’m experimenting with the @fragment decorator and I’m having better luck there, but I still need to figure out how to properly trigger the rerun and save the selections so they don’t get wiped out when I do other interactions.

I’ll repost my progress there soon, but meanwhile, let me know if you’re not seeing a similar behavior with your session state solution. I’m on python==3.10 and streamlit==1.36.0.

I got the same problems with the st.session_state. And canno’t really change it, because the app doesn’t recognize the button change the state.
This is why i don’t like checkbox :smiley:
I will check if i find something to go around (On my dashboard i got plenty of time this bug, the way i turn around is by avoid session state and using Class (you update the class object) and it’s work fine !)

Update (Thank ChatGPT i was lazy to create a class)
It works perfectly ! :slight_smile:

class DataFrameEditor:
    def __init__(self, df: pd.DataFrame):
        self.original_df = df.copy()
        self.edited_df = df.copy()
        self.init_value = False
    
    def update_dataframe(self, new_df: pd.DataFrame):
        self.edited_df = new_df
    
    def has_changes(self) -> bool:
        return not self.edited_df.equals(self.original_df)

    def reset_original(self):
        self.original_df = self.edited_df.copy()

# Sample DataFrame
df = pd.DataFrame({
    "Animal": ["Lion", "Elephant", "Giraffe", "Monkey", "Zebra"],
    "Habitat": ["Savanna", "Forest", "Savanna", "Forest", "Savanna"],
    "Lifespan (years)": [15, 60, 25, 20, 25],
    "Average weight (kg)": [190, 5000, 800, 10, 350],
})

# Define and apply a simple filter
life_span_range = st.sidebar.slider("Select Lifespan range", 15, 60, (15, 60))

def dataframe_with_selections(df: pd.DataFrame, editor: DataFrameEditor) -> pd.DataFrame:
    df_with_selections = df.copy()
    
    # Add Select column to the DataFrame
    df_with_selections['Select'] = editor.init_value

    placeholder = st.empty()
    # Get DataFrame row selections from user with st.data_editor
    edited_df = placeholder.data_editor(
        df_with_selections,
        hide_index=True,
        column_config={"Select": st.column_config.CheckboxColumn(required=True)},
        disabled=df.columns,
    )
    
    # Check for changes and update the editor object
    if not edited_df.equals(editor.edited_df):
        editor.update_dataframe(edited_df)
        editor.reset_original()

    # Make modification on the DataFrame (lifespan)
    edited_df = edited_df[edited_df["Lifespan (years)"].between(*life_span_range)]

    # Filter the DataFrame using the temporary column, then drop the column
    selected_rows = edited_df[edited_df.Select]
    st.write(list(edited_df.Select))

    return selected_rows.drop('Select', axis=1)

# Create a DataFrameEditor instance
editor = DataFrameEditor(df)

selection = dataframe_with_selections(df, editor)
st.write("Your selection:")
st.write(selection)

Thanks again @Faltawer! This is getting one step closer, but it’s still not behaving the way it’s supposed to. The filters are only applied after the selection is made, instead of the original table. :smiling_face_with_tear:

Filtering the rows afterwards works even without session_state but the idea is to filter the data first, basically remove any rows that don’t pass the filters, and then make selections off of those. In your updated code, when I change the filter, I still see all the rows in the original table.

You can check out the GIF here again to see what I mean:

Hey, i did move


    # Make modification on the DataFrame (lifespan)
    edited_df = edited_df[edited_df["Lifespan (years)"].between(*life_span_range)]

but i you filter df

    # Create a DataFrameEditor instance
    editor = DataFrameEditor(df)

  # Make modification on the DataFrame (lifespan)
    df= df[df["Lifespan (years)"].between(*life_span_range)]

It’s working fine for me !
Tell me if it’s good :smiley:

Still not working on my end. :frowning:
Can you send the whole code top to bottom to make sure I’m running the same exact thing?

Thanks again for your help. I’m really grateful for the time you’re spending on this.

AH ! I understand what i was missing the slider, that reset the checkbox.
I will try to see this later !

Ok I think I have a solution now. It’s not perfect but it works.

dat_selection_with_fragments

I’m defining the selection process inside a st.fragment so the app doesn’t refresh after every selection. The only downside here is that now I need an external widget to trigger the rerun hence the Save Selection button. What I don’t like about this is that anything else, including interactions with the filter, can also update the selection so it doesn’t feel like a clean solution to me.

Anyway, you can see that changing the values of the filter preserves the original selection unless they are explicitly de-selected. I also show at the end how we can narrow down the table, de-select a couple of rows (Giraffe and Monkey), and still have the original row outside the filter (Elephant) remain as selected.

I also made a change to the way I save selections in session_state. I created a hash by concatenating the values of each column, and am saving the selection state of each row in a dictionary. Then after each interaction with the data_editor I only update the values of the ones that were present in the filtered dataframe. That’s how I preserve the status of the rows that are outside the filter. Depending on the use case, the output dataframe could include the selected rows outside the filter as well (I’m not including them here).

I’m pretty sure what I’m doing is a bit of an overkill and all of this should probably be achievable in a simpler way but this works for now. I experimented with submit_form as well, but it was not working as expected. Maybe I’ll give that another try later. Let me know if you come up with anything else that works.

Full code to replicate the GIF:

import streamlit as st
import numpy as np
import pandas as pd
from hashlib import md5

df = pd.DataFrame(
    {
        "Animal": ["Lion", "Elephant", "Giraffe", "Monkey", "Zebra"],
        "Habitat": ["Savanna", "Forest", "Savanna", "Forest", "Savanna"],
        "Lifespan (years)": [15, 60, 25, 20, 25],
        "Average weight (kg)": [190, 5000, 800, 10, 350],
    }
)

def init_select_column(df: pd.DataFrame) -> pd.DataFrame:
    """Initialize the dataframe with a selection column and a hash column"""
    df_with_selections = df.copy()
    df_with_selections.insert(0, "Select", False)

    # Create a hash column by concatenating all the columns
    df_with_selections["hash"] = df_with_selections.apply(lambda x: md5("|".join(tuple(str(x))).encode()).hexdigest(), axis = 1)

    # Initialize the selection status in the session state 
    # with the hash as the key and the selection status as the value
    if "select_status" not in st.session_state:
        st.session_state.select_status = df_with_selections[["Select", "hash"]].set_index("hash")["Select"].to_dict()
    
    return df_with_selections

def filter_df(df: pd.DataFrame) -> pd.DataFrame:
    """Filter the dataframe based on the lifespan range selected by the user"""
   
    # simple filter example 
    life_span_range = st.sidebar.slider("Select Lifespan range", 15, 60, (15, 60))
    filtered_df = df[df["Lifespan (years)"].between(*life_span_range)]
    if "select_status" in st.session_state:
        filtered_df["Select"] = filtered_df["hash"].map(st.session_state["select_status"])

    return filtered_df

@st.experimental_fragment
def dataframe_with_selections(df: pd.DataFrame, init_value: bool = False) -> pd.DataFrame:
    """Display the dataframe with a selection column and allow the user to select rows"""

    st.write("This is a fragment now 👇")
    # Get dataframe row-selections from user with st.data_editor
    edited_df = st.data_editor(
        df,
        hide_index=True,
        column_config={"Select": st.column_config.CheckboxColumn(required=True)},
        disabled=[col for col in df.columns if col != "Select"],
    )

    # Update the selection status in the session state
    if "select_status" in st.session_state:
        with st.expander(f"selection log"):
            for _, (select, hash) in edited_df[['Select', "hash"]].iterrows():
                st.session_state["select_status"][hash] = select
                
                if select:
                    st.info(f"{hash} is selected")
        

    # Filter the dataframe using the temporary column, then drop the column
    selected_rows = edited_df[edited_df.Select]
    return selected_rows.drop('Select', axis=1)

df = init_select_column(df)
filtered_df = filter_df(df)
selection = dataframe_with_selections(filtered_df)
st.button("⬇️ Save Selection")
st.write("Your selection:")
st.write(selection)
2 Likes

You did a really nice work !
Without the experimental_fragment, it’s working but i got the ‘problem’ where sometimes you need to click twice.
Not sure if i can do something better :smiley:

1 Like