Filter data in data_editor

Summary

I´m searching for an easy solution to filter data in st.data_editor, when the df contains a huge amount of data. In my case, I load 10.000 rows (3 columns) into the st.data_editor. I want to type in e.g. “apple” and want to display only rows, which contain the word apple in column 1.

Steps to reproduce

Code snippet:

df=pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"]})

df_ed=st.data_editor(df,hide_index=True, use_container_width=True,column_order=("Par","Cat1","Cat2"))

Expected behavior:

Filter shall be possible on column “Par” (… or all), so if I type in “apple” anywhere, only the row with “apple” will be displayed.

Actual behavior:

No filtering possible. Unfortunately no idea to solve this.

Debug info

  • Streamlit version: 1.24.1
  • Python version: 3.9.17
  • Using PipEnv
  • OS version: Win 10
  • Browser version:
1 Like

Hey @teq508!

If you focus on the data editor and hit Cmd+F, that will trigger a search box. But you’re right, this won’t filter the underlying data, but only enable you to spot rows that contain your query.

One way to achieve your desired output is to add a text_input above the dataframe, which exactly filters the data being passed in the data editor:

import pandas as pd 

df=pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"]})

query = st.text_input("Filter dataframe")

if query:
    mask = df.applymap(lambda x: query in str(x).lower()).any(axis=1)
    df = df[mask]

st.data_editor(
    df,
    hide_index=True, 
    column_order=("Par","Cat1","Cat2")
) 

CleanShot 2023-09-26 at 16.46.59

Hope that helps,

1 Like

Hi @arnaud,

this is pretty cool! But is there also a way to save the output of st.data_editor, that changes in the dataframe are still present after the filter is set/changed/released?
It´s clear for me, that streamlit reruns after each filter setting or df change - but I don´t know how to save the output with st.session_state (… already tried it after your post)

You see, I added the variable NEW_DATA as unfullfilled solution…

import pandas as pd
import streamlit as st

df=pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"]})

query = st.text_input("Filter dataframe")

if query:
    mask = df.applymap(lambda x: query in str(x).lower()).any(axis=1)
    df = df[mask]

**NEW_DATA**=st.data_editor(
    df,
    hide_index=True, 
    column_order=("Par","Cat1","Cat2")

Hi all,
I tried something out based on another question with a good idea of @tonykip, which I adapted to my initial problem.

Link to question “session-state-logic”

Result:

if'dfa' not in st.session_state:
    st.session_state['dfa']= pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"],'Active':[False,False,False]})

name=st.text_input("Search for ...")
if name == '':
    st.session_state['dfa'].Active=True
else:
    st.session_state['dfa'].Active=False
    st.session_state['dfa'].loc[st.session_state['dfa']['Par'].str.contains(name),'Active']=True

active_dfa = st.session_state['dfa'][st.session_state['dfa']['Active']==True].copy()
edited_dfa = st.data_editor(active_dfa,column_order=['Par','Cat1','Cat2'])
print(edited_dfa)
if edited_dfa is not None:
    st.session_state['dfa'].update(edited_dfa)

This works now fine. Any idea, how to ignore capitals so that it is not case sensitive?

Thank you @tonykip for this great idea from other post.

Glad you found it useful. Can you explain the case sensitivity more and also share the full code of what you’re referring to and some screenshots or screen recording?

Hi @tonykip,
in this line, it searchs after the filtered name, as you know.
If I type in “Straw” e.g., “Strawberry” was found, as I wanted to.

But if I type “straw”, it doesn´t work.

You can pass an additional argument to ignore the case (case=False) while searching like so in the str.contains() method:

st.session_state['dfa'].loc[st.session_state['dfa']['Par'].str.contains(name, case=False),'Active']=True

Let me know if this solves the issue for you.

1 Like

Exactly what I was searching for. Thank you @tonykip !

Hi all and @tonykip

… found unfortunately a bug in the solution (yours and mine). Please try my code. Click into a column, e.g. Cat1 and type very fast something on the keyboard and press ok. In some cases, the value will not be saved and jumps back to the previous value. This appears also in your example:
https://discuss.streamlit.io/t/session-state-logic/51572

After some tests, this is for every second change, where it doesn´t work.

Any idea how to solve this??

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

@mathcatsand pointed me to some helpful posts about the issue you’re facing @teq508:

Explanation of issue: Updating starting dataframe in experimental_data_editor - #4 by mathcatsand
Solution: Editable dataframes are here! ✍️ - #33 by mathcatsand

1 Like

Hi all,

@blackary … I migrated the code from @mathcatsand
Editable dataframes are here! :writing_hand: - #33 by mathcatsand .

This helps concerning the problem that every second entry wasn’t accepted, but it leads to a new problem, which wasn´t there before.

If you change some values in the table now, this works, as long as you change the entry in the filter selection. When you apply a filter (e.g. Straw), change values in the table, and eliminate the filter all changed values are discarded and set back to the initial values.

Please note, that I didn’t write edited_dfa to session_state in comparision to @mathcatsand, because in this case the elimination of filter doesn’t work after first apply.

Any further ideas to solve it?

import streamlit as st
import pandas as pd

if'dfa' not in st.session_state:
    st.session_state['dfa']= pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"],'Active':[False,False,False]})
    st.session_state.rerun = False
    
def rerun():
    st.session_state.rerun = True

name=st.text_input("Search for ...")
if name == '':
    st.session_state['dfa'].Active=True
else:
    st.session_state['dfa'].Active=False
    st.session_state['dfa'].loc[st.session_state['dfa']['Par'].str.contains(name,case=False),'Active']=True

active_dfa = st.session_state['dfa'][st.session_state['dfa']['Active']==True].copy()
edited_dfa = st.data_editor(active_dfa,column_order=['Par','Cat1','Cat2'],on_change=rerun)
    
if st.session_state.rerun:
    st.session_state.rerun = False
    st.rerun()

Sorry, I haven’t been able to figure out a combination that works with being able to filter, edit, and have everything persist without the double-edit problem. Maybe @mathcatsand can suggest something.

Ok, but thank you for your effort. Hope for @mathcatsand for an appropriate answer.

This is going to get much harrier combining the filter with edits. What you’re going to have to do is use a callback on the data editor and read the edits saved in Session State, reverse lookup the (pandas) index of any rows that were edited, and commit those changes to the original. If you’re going to go through all that trouble to handle editing on a filtered DataFrame, we can extend that to edits on an unfiltered DataFrame and dispense with the extra rerun command in this case.

import streamlit as st
import pandas as pd
import time

if "dfa" not in st.session_state:
    st.session_state["dfa"] = pd.DataFrame(
        {
            "Par": ["Apple", "Strawberry", "Banana"],
            "Cat1": ["good", "good", "bad"],
            "Cat2": ["healthy", "healthy", "unhealthy"],
            "Active": [False, False, False],
        }
    )


def active_dfa():
    return st.session_state["dfa"][st.session_state["dfa"]["Active"] == True].copy()


def get_index(row):
    return active_dfa().iloc[row].name


def commit():
    for row in st.session_state.editor["edited_rows"]:
        row_index = get_index(row)
        for key, value in st.session_state.editor["edited_rows"][row].items():
            st.session_state["dfa"].at[row_index, key] = value


st.header("Filter and edit data")
name = st.text_input("Search for ...")
if name == "":
    st.session_state["dfa"].Active = True
else:
    st.session_state["dfa"].Active = False
    st.session_state["dfa"].loc[
        st.session_state["dfa"]["Par"].str.contains(name, case=False), "Active"
    ] = True

edited_dfa = st.data_editor(
    active_dfa(), column_order=["Par", "Cat1", "Cat2"], key="editor", on_change=commit
)

This solution can also be extended to num_rows="dynamic" but you’ll have to be extra careful handling the index to keep it hidden and not mucking up the user interface. Resetting the index with each row addition/deletion should do it…

1 Like

Hi @mathcatsand,

thank you so much for your answer.
I will make a deep-dive in your implementation today to understand your solution.
In the meantime, I found also a solution late last night with only one additional line (I watched your video concerning session_state and had an idea based on it):

I added this line to save back the data from ’ edited_dfa’ to session_state.

 st.session_state['dfa'] = edited_dfa # new

What do you think about the solution? Please give it a trial.
UPDATE: I saw in this moment, that changes during filtering don´t work. :wink:

import streamlit as st
import pandas as pd

if 'dfa' not in st.session_state:
    st.session_state['dfa']= pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"],'Active':[False,False,False]})
    st.session_state.rerun = False

def rerun():
    st.session_state.rerun = True
    
name=st.text_input("Search for ...")
if name == '':
    st.session_state['dfa'].Active=True
else:
    st.session_state['dfa'].Active=False
    st.session_state['dfa'].loc[st.session_state['dfa']['Par'].str.contains(name,case=False),'Active']=True

active_dfa = st.session_state['dfa'][st.session_state['dfa']['Active']==True].copy()
edited_dfa = st.data_editor(active_dfa,column_order=['Par','Cat1','Cat2'],on_change=rerun)
    
if st.session_state.rerun:
    st.session_state.rerun = False
    st.session_state['dfa'] = edited_dfa # new
    st.rerun()
1 Like

st.session_state['dfa'] = edited_dfa # new

Since your edited dataframe is built off of your filtered dataframe, I believe this would effectively delete rows from the original when they are filtered out and an edit is made.

In this moment I saw, that my solution has still a bug. If you filter, and then change a value, the filter cannot be deactivated again.

1 Like

Hi @mathcatsand,

your solution is really great in general. It helps me also to solve other problems in my code, which costs me nights.

But, one problem remains, what you probably didn´t recognize caused by the small dataframe I delivered. If you extend the dataframe to more rows (greater than screen), you´ll see, that with every cell change (causes probably due on_change) the table jumps back to top left. This is “a small thing”, but prevents completely to UX. Do you have an idea to stop this jump?

Thanks in advance :slight_smile: