I´m searching for an easy solution to filter data in st.data_editor, when the df contains a huge amount of data. In my case, I load 10.000 rows (3 columns) into the st.data_editor. I want to type in e.g. “apple” and want to display only rows, which contain the word apple in column 1.
If you focus on the data editor and hit Cmd+F, that will trigger a search box. But you’re right, this won’t filter the underlying data, but only enable you to spot rows that contain your query.
One way to achieve your desired output is to add a text_input above the dataframe, which exactly filters the data being passed in the data editor:
import pandas as pd
df=pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"]})
query = st.text_input("Filter dataframe")
if query:
mask = df.applymap(lambda x: query in str(x).lower()).any(axis=1)
df = df[mask]
st.data_editor(
df,
hide_index=True,
column_order=("Par","Cat1","Cat2")
)
this is pretty cool! But is there also a way to save the output of st.data_editor, that changes in the dataframe are still present after the filter is set/changed/released?
It´s clear for me, that streamlit reruns after each filter setting or df change - but I don´t know how to save the output with st.session_state (… already tried it after your post)
You see, I added the variable NEW_DATA as unfullfilled solution…
import pandas as pd
import streamlit as st
df=pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"]})
query = st.text_input("Filter dataframe")
if query:
mask = df.applymap(lambda x: query in str(x).lower()).any(axis=1)
df = df[mask]
**NEW_DATA**=st.data_editor(
df,
hide_index=True,
column_order=("Par","Cat1","Cat2")
if'dfa' not in st.session_state:
st.session_state['dfa']= pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"],'Active':[False,False,False]})
name=st.text_input("Search for ...")
if name == '':
st.session_state['dfa'].Active=True
else:
st.session_state['dfa'].Active=False
st.session_state['dfa'].loc[st.session_state['dfa']['Par'].str.contains(name),'Active']=True
active_dfa = st.session_state['dfa'][st.session_state['dfa']['Active']==True].copy()
edited_dfa = st.data_editor(active_dfa,column_order=['Par','Cat1','Cat2'])
print(edited_dfa)
if edited_dfa is not None:
st.session_state['dfa'].update(edited_dfa)
This works now fine. Any idea, how to ignore capitals so that it is not case sensitive?
Thank you @tonykip for this great idea from other post.
Glad you found it useful. Can you explain the case sensitivity more and also share the full code of what you’re referring to and some screenshots or screen recording?
… found unfortunately a bug in the solution (yours and mine). Please try my code. Click into a column, e.g. Cat1 and type very fast something on the keyboard and press ok. In some cases, the value will not be saved and jumps back to the previous value. This appears also in your example: https://discuss.streamlit.io/t/session-state-logic/51572
After some tests, this is for every second change, where it doesn´t work.
This helps concerning the problem that every second entry wasn’t accepted, but it leads to a new problem, which wasn´t there before.
If you change some values in the table now, this works, as long as you change the entry in the filter selection. When you apply a filter (e.g. Straw), change values in the table, and eliminate the filter all changed values are discarded and set back to the initial values.
Please note, that I didn’t write edited_dfa to session_state in comparision to @mathcatsand, because in this case the elimination of filter doesn’t work after first apply.
Any further ideas to solve it?
import streamlit as st
import pandas as pd
if'dfa' not in st.session_state:
st.session_state['dfa']= pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"],'Active':[False,False,False]})
st.session_state.rerun = False
def rerun():
st.session_state.rerun = True
name=st.text_input("Search for ...")
if name == '':
st.session_state['dfa'].Active=True
else:
st.session_state['dfa'].Active=False
st.session_state['dfa'].loc[st.session_state['dfa']['Par'].str.contains(name,case=False),'Active']=True
active_dfa = st.session_state['dfa'][st.session_state['dfa']['Active']==True].copy()
edited_dfa = st.data_editor(active_dfa,column_order=['Par','Cat1','Cat2'],on_change=rerun)
if st.session_state.rerun:
st.session_state.rerun = False
st.rerun()
Sorry, I haven’t been able to figure out a combination that works with being able to filter, edit, and have everything persist without the double-edit problem. Maybe @mathcatsand can suggest something.
This is going to get much harrier combining the filter with edits. What you’re going to have to do is use a callback on the data editor and read the edits saved in Session State, reverse lookup the (pandas) index of any rows that were edited, and commit those changes to the original. If you’re going to go through all that trouble to handle editing on a filtered DataFrame, we can extend that to edits on an unfiltered DataFrame and dispense with the extra rerun command in this case.
import streamlit as st
import pandas as pd
import time
if "dfa" not in st.session_state:
st.session_state["dfa"] = pd.DataFrame(
{
"Par": ["Apple", "Strawberry", "Banana"],
"Cat1": ["good", "good", "bad"],
"Cat2": ["healthy", "healthy", "unhealthy"],
"Active": [False, False, False],
}
)
def active_dfa():
return st.session_state["dfa"][st.session_state["dfa"]["Active"] == True].copy()
def get_index(row):
return active_dfa().iloc[row].name
def commit():
for row in st.session_state.editor["edited_rows"]:
row_index = get_index(row)
for key, value in st.session_state.editor["edited_rows"][row].items():
st.session_state["dfa"].at[row_index, key] = value
st.header("Filter and edit data")
name = st.text_input("Search for ...")
if name == "":
st.session_state["dfa"].Active = True
else:
st.session_state["dfa"].Active = False
st.session_state["dfa"].loc[
st.session_state["dfa"]["Par"].str.contains(name, case=False), "Active"
] = True
edited_dfa = st.data_editor(
active_dfa(), column_order=["Par", "Cat1", "Cat2"], key="editor", on_change=commit
)
This solution can also be extended to num_rows="dynamic" but you’ll have to be extra careful handling the index to keep it hidden and not mucking up the user interface. Resetting the index with each row addition/deletion should do it…
thank you so much for your answer.
I will make a deep-dive in your implementation today to understand your solution.
In the meantime, I found also a solution late last night with only one additional line (I watched your video concerning session_state and had an idea based on it):
I added this line to save back the data from ’ edited_dfa’ to session_state.
st.session_state['dfa'] = edited_dfa # new
What do you think about the solution? Please give it a trial.
UPDATE: I saw in this moment, that changes during filtering don´t work.
import streamlit as st
import pandas as pd
if 'dfa' not in st.session_state:
st.session_state['dfa']= pd.DataFrame({"Par":["Apple","Strawberry","Banana"],"Cat1":["good","good","bad"],"Cat2":["healthy","healthy","unhealthy"],'Active':[False,False,False]})
st.session_state.rerun = False
def rerun():
st.session_state.rerun = True
name=st.text_input("Search for ...")
if name == '':
st.session_state['dfa'].Active=True
else:
st.session_state['dfa'].Active=False
st.session_state['dfa'].loc[st.session_state['dfa']['Par'].str.contains(name,case=False),'Active']=True
active_dfa = st.session_state['dfa'][st.session_state['dfa']['Active']==True].copy()
edited_dfa = st.data_editor(active_dfa,column_order=['Par','Cat1','Cat2'],on_change=rerun)
if st.session_state.rerun:
st.session_state.rerun = False
st.session_state['dfa'] = edited_dfa # new
st.rerun()
Since your edited dataframe is built off of your filtered dataframe, I believe this would effectively delete rows from the original when they are filtered out and an edit is made.
your solution is really great in general. It helps me also to solve other problems in my code, which costs me nights.
But, one problem remains, what you probably didn´t recognize caused by the small dataframe I delivered. If you extend the dataframe to more rows (greater than screen), you´ll see, that with every cell change (causes probably due on_change) the table jumps back to top left. This is “a small thing”, but prevents completely to UX. Do you have an idea to stop this jump?
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.