Scalabilty of streamlit for pandas

Hi @Yasha. Welcome to the community! :hugs:

Yes. Large dataframes can slow down a Streamlit App! In genera displaying more than 100k elements can start to get sluggish.

Please note that this doesn’t mean that Streamlit can’t be used with huge datasets, only that you can’t quickly display those datasets directly to the screen with st.write or st.dataframe. The good news is that usually you don’t actually want to send that many elements to the browser! :sunflower:

Instead you might want to:

  1. Display a quick-and-dirty subset to get an idea for your data.
st.write(df[:1000])
  1. Write a little filter UI for your data and only display the subset the user wants to see. An example of such a UI is shown here in the Udacity dataset demo.

  2. Use something like display_dataframe_quickly defined in this gist and reproduced here:

def display_dataframe_quickly(df, max_rows=5000, **st_dataframe_kwargs):
    """Display a subset of a DataFrame or Numpy Array to speed up app renders.
    
    Parameters
    ----------
    df : DataFrame | ndarray
        The DataFrame or NumpyArray to render.
    max_rows : int
        The number of rows to display.
    st_dataframe_kwargs : Dict[Any, Any]
        Keyword arguments to the st.dataframe method.
    """
    n_rows = len(df)
    if n_rows <= max_rows:
        # As a special case, display small dataframe directly.
        st.write(df)
    else:
        # Slice the DataFrame to display less information.
        start_row = st.slider('Start row', 0, n_rows - max_rows)
        end_row = start_row + max_rows
        df = df[start_row:end_row]

        # Reindex Numpy arrays to make them more understadable.
        if type(df) == np.ndarray:
            df = pd.DataFrame(df)
            df.index = range(start_row,end_row)

        # Display everything.
        st.dataframe(df, **st_dataframe_kwargs)
        st.text('Displaying rows %i to %i of %i.' % (start_row, end_row - 1, n_rows))

To test this method you can run:

streamlit run https://gist.githubusercontent.com/treuille/ff9194ed50af277fc56788d7aed7fcba/raw

You should see this:

Of course the magic of Streamlit is that it usually “just works.” Therefore we are also working on improvements to make Streamlit faster for large DataFrames. In addition to the fix referenced above, we’re also considering using compression for Streamlit packets, or increasing responsiveness by showing a progress bar when sending large packets.

Please feel free to comment on or follow any of these issues for up-to-date information.

3 Likes