Slow cell selection in dataframe

Hi guys,
I already install version 1.18
I have a CSV file with 100,000 rows of data. When I use it in Streamlit, everything works smoothly and quickly. However, whenever I click on a column in the dataframe to sort it, the process becomes slow and lags every time I change my cell selection in the dataframe.

I tested it on a dataframe with 46 columns and 100k rows. Sorting a column is around 2sec, editing a single cell is around 3sec, done locally from my old PC with i7-2600K processor, 12GB RAM on windows 10.

"""
streamlit 1.18.0
"""

import streamlit as st
import pandas as pd


@st.cache_data()
def get_data():
    return pd.read_csv('lit_deriv.csv', nrows=100000)

df = get_data()

st.write('### Init')
st.write(f'shape: {df.shape}')
st.dataframe(df, height=100)

st.write('### Data editor')
update = st.experimental_data_editor(df, height=100)

st.write('### Update')
st.dataframe(update, height=100)

Output

Hi Ferdy, thanks for your reply.
By the way, is there a way to prevent the dataframe from being constantly re-processed? In my situation, simply selecting a cell in the dataframe with a mouse click seems to trigger some event, even though I havenโ€™t taken any other actions.

Could you post a sample code.

hi ferdy, it is just a simple code like:
df = pd.read_csv(โ€˜D:\folder\data.csvโ€™)
st.dataframe(df)
and i already use @st.cache_data
before i sort the dataframe everything was smooth when i select a cell using mouse click (no lag)

Try this code. Adjust the nrows value to check responsiveness.

"""
streamlit 1.18.0
"""

import streamlit as st
import pandas as pd


@st.cache_data()
def get_data():
    return pd.read_csv(
        "https://github.com/plotly/datasets/raw/master/uber-rides-data1.csv",
         nrows=100000)

df = get_data()

st.write('### Init')
st.write(f'shape: {df.shape}')
st.dataframe(df, height=100)

st.write('### Data editor')
update = st.experimental_data_editor(df, height=100)

st.write('### Update')
st.dataframe(update, height=100)

Reference:
caching

Hereโ€™s my scenario:

  1. Before i sort a column, I select a cell anywhere within a Streamlit dataframe without any lag.
  2. I sort a column, for example the โ€œDate/Timeโ€ column.
  3. After sorting the column, when I select a cell anywhere within the Streamlit dataframe and then change the selection to another cell, it becomes laggy and does not instantly select the new cell.

If it is by design, it is okay for me

Try to reduce the number of rows, perhaps 100k is too much. Letโ€™s say 10k, is it still slow?

Unfortunately, the current implementation of column sort gets slow at around 100k rows. The reason is that this is fully sorted within your browser, so itโ€™s very dependent on the resources the browser has access to. When you sort a very large dataframe, it also impacts the cell selections. This might be something we might be able to fix. In the long-term, we probably need to move to a backend-based sorting for large tables.

2 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.