Selection is sometimes not returned from custom widgets

Hi,
I have a streamlit app which uses the AgGrid component to make rows in a table selectable and the plotly_events component to get feedback from clicked points in charts.
Both components show the same behaviour, that the feedback sometimes just doesn’t work and the return value is emtpy. It does however work sometimes, like 50% of the time, so I am not sure if this is a bug on the streamlit side since it happends for both components independently.
Has anybody obvserved similar issues?

streamlit version is 1.3.1

I managed to build a minimal example for this which has a quite interesting behaviour.
The selection feedback of AgGrid works in this code but if I remove the caching of the dataframe it breaks it.
I have seen somewhat similar issues with selection feedback with the plotly_events component as well so possibly it is a streamlit side bug.

How to reproduce:

  • start the app and select some rows
  • press submit, it will display the selected rows
  • now comment out the caching line in the code
  • reload the page and repeat the first two steps, for me the selection no longer works now
import streamlit as st
from st_aggrid import AgGrid, GridOptionsBuilder
import pandas as pd
import numpy as np

@st.experimental_memo # commenting this line out breaks the selection feedback for me as the "selected_rows" entry in the grid_response is always empty
def get_df():
    df = pd.DataFrame(columns=['foo','bar','baz'], data=np.random.choice(range(10), size=(100000,3)))
    return df

df = get_df()

gb = GridOptionsBuilder.from_dataframe(df)
gb.configure_default_column(value=True, enableRowGroup=True, aggFunc=None, editable=False)

gb.configure_selection(selection_mode="multiple", use_checkbox=True)

with st.form("table_form", clear_on_submit=False):
    grid_response = AgGrid(df, gridOptions=gb.build(), height=700, data_return_mode="AS_INPUT", update_mode="SELECTION_CHANGED")#, enable_enterprise_modules=True)#.style.apply(highlight_clusters, axis=1))

    st.write(f"grid_response {grid_response}")
    selected = grid_response['selected_rows']
    st.write(f"selected {selected}")
    if st.form_submit_button("Submit"):
        pass

Hi @thunderbug1 :wave:

This is my working hypothesis: the issue is neither with Streamlit nor streamlit-aggrid. It has to do with your data. Namely, that with every interaction you generate a completely new DataFrame; wiping your prior selections from memory!

The source of the behavior is: np.random.choice(range(10), size=(100000,3))

Notice that the below example works as expected without st.experimental_memo:

import streamlit as st
from st_aggrid import AgGrid, GridOptionsBuilder
import pandas as pd
import numpy as np

np.random.seed(42)

def get_df():
    df = pd.DataFrame(columns=['foo','bar','baz'], data=np.random.choice(range(10), size=(100000,3)))
    return df

df = get_df()

gb = GridOptionsBuilder.from_dataframe(df)
gb.configure_default_column(value=True, enableRowGroup=True, aggFunc=None, editable=False)

gb.configure_selection(selection_mode="multiple", use_checkbox=True)

with st.form("table_form", clear_on_submit=False):
    grid_response = AgGrid(df, gridOptions=gb.build(), height=700, data_return_mode="AS_INPUT", update_mode="SELECTION_CHANGED")#, enable_enterprise_modules=True)#.style.apply(highlight_clusters, axis=1))

    st.write(f"grid_response {grid_response}")
    selected = grid_response['selected_rows']
    st.write(f"selected {selected}")
    if st.form_submit_button("Submit"):
        pass

The trick is to make the random data deterministic by setting the random number seed before generating it, via numpy.random.seed(some_number).

Caching the data, as you did, effectively did the same thing: made the random data deterministic (as you reuse the cached dataframe with every widget interaction)! When you don’t cache the data and generate random data, hitting the submit button causes the script to run from top-to-bottom… generating a new dataframe without your prior selections.

Hope this helps! Let me know if I’m not being clear.

Best, :balloon:
Snehan

1 Like

Ah yes, good point.
In my actual application the dataframe comes from a database and I cannot guarantee that it will still be the same when the user submits the selection and the script is rerun.

The automatic reloading of new data is nice, from a user point of view, but here it is an issue.

Have you considered setting a reasonable value for the ttl parameter so that the entry in cache expires after X seconds?

Yes I did, but even if I solve it this way it could happen again if the user selection is submitted after the cache was cleared. (In my application it can easily happen that users need some minutes to carefully select everything)

At the same time, users also want to see their input once it is entered into the database.
I think a button where the user can manually trigger a data refresh will be the safest solution.

FYI - Caching also breaks the table’s editability