Using Aggrid in a data scraping application

Hello,
I’m new to streamlit and have been playing around with it for a few days. It’s very cool! However I’ve run into a bit of a problem and I can’t quite work out what is happening or how to fix it.

Use-case:

I’d like to make a simple app that will
a) Allow the user to scrape some data off the internet
b) Display the data to the user in an Aggrid table and allow them to make edits to cells
c) Download the final data

Problem

I’ve set up the basic idea (see code below), but it’s not working. Everytime a user tries to edit more than 2 cells in a row, the entire table 'refreshes" and the changes are not persistent.

Ask

I’m not sure what I’m doing wrong? I’m still new to Streamlit, so likely I’ve made an error because I don’t quite understand how it all works. Could someone suggest where I might have gone wrong? Thank you

from st_aggrid import GridOptionsBuilder, AgGrid, GridUpdateMode
import streamlit as st
import pandas as pd
import numpy as np


np.random.seed(42)

st.title('Scraping tool')

@st.cache_data()
def gen_random_data(nrows):
    data = np.random.random((nrows,3))
    data_frame = pd.DataFrame(data,columns = ['A','B','C'])
    return data_frame

button = st.button('Scrape!',key = 'scrape_button')

if st.session_state.get("scrape_button"):
    data_load_state = st.text('Loading data...')
    st.session_state.data = gen_random_data(10)
    data_load_state.text('Loading data...Done!')
    st.session_state.aggrid_key = True

if 'aggrid_key' in st.session_state:
    gb = GridOptionsBuilder.from_dataframe(st.session_state.data)
    gb.configure_default_column(groupable=True, value=True, enableRowGroup=True, aggFunc='sum', editable=True)
    gb.configure_grid_options(domLayout='normal')
    st.session_state.gridOptions = gb.build()

    grid_response = AgGrid(st.session_state.data,
                           gridOptions=st.session_state.gridOptions,
                           update_mode=GridUpdateMode.GRID_CHANGED,
                            height=1000,
                            width='100%',
                            fit_columns_on_grid_load=False,
                            allow_unsafe_jscode=True, #Set it to True to allow jsfunction to be injected
                            enable_enterprise_modules=False,
                            editable = True)

    st.session_state.data = grid_response['data']
    st.download_button("Download data", data=st.session_state.data.to_csv(), file_name='test_data.csv')

The streamlit main concept is a must for every developer. Pay attention to the Data flow.

Data flow

Streamlit’s architecture allows you to write apps the same way you write plain Python scripts. To unlock this, Streamlit apps have a unique data flow: any time something must be updated on the screen, Streamlit reruns your entire Python script from top to bottom.

This can happen in two situations:

  • Whenever you modify your app’s source code.
  • Whenever a user interacts with widgets in the app. For example, when dragging a slider, entering text in an input box, or clicking a button.

Okay, thanks! I managed to get it working how I want. The problem was encasing the Aggrid in an if statement. I managed to get around this by initializing the data frame as empty and then only updating the data once the data is pressed. Here’s my solution:

from st_aggrid import GridOptionsBuilder, AgGrid, GridUpdateMode
import streamlit as st
import pandas as pd
import numpy as np

# Initialisation
np.random.seed(42)
if 'data' not in st.session_state:
    st.session_state.data = pd.DataFrame({})


st.title('Scraping tool')

@st.cache_data()
def gen_random_data(nrows):
    data = np.random.random((nrows,3))
    data_frame = pd.DataFrame(data,columns = ['A','B','C'])
    return data_frame

button = st.button('Scrape!',key = 'scrape_button')

if button:
    data_load_state = st.text('Loading data...')
    st.session_state.data = gen_random_data(10)
    data_load_state.text('Loading data...Done!')

gb = GridOptionsBuilder.from_dataframe(st.session_state.data)
gb.configure_default_column(groupable=True, value=True, enableRowGroup=True, aggFunc='sum', editable=True)
gb.configure_grid_options(domLayout='normal')
st.session_state.gridOptions = gb.build()

grid_response = AgGrid(st.session_state.data,
                       gridOptions=st.session_state.gridOptions,
                       update_mode=GridUpdateMode.GRID_CHANGED,
                        height=300,
                        width='100%',
                        fit_columns_on_grid_load=False,
                        allow_unsafe_jscode=True, #Set it to True to allow jsfunction to be injected
                        enable_enterprise_modules=False,
                        editable = True)

st.download_button("Download data",
                   data=grid_response['data'].to_csv(),
                   file_name='test_data.csv')