St.dataframe() is slow when the dataframe is a Pandas Styler object

When I display a data frame using st.dataframe(), it displays commas as thousand separators, which is undesirable. I removed the thousand separator using the Pandas’s style.format() method, but this causes a performance penalty on the dashboard.

The data frame, let’s say it’s called ‘big_df’, is not small. It’s about 500 rows by 100 columns. It is the output of a cached function decorated with @st.cache_data(). When displaying the unstyled data frame using st.dataframe(), it is quick. However, when I do:

st.dataframe(big_df.style.format(precision=3, thousands= None, decimal=“.”),
use_container_width=True)

Streamlit will become slow. I wonder if there is a way to cache the styled object, or is the actual displaying of the styled object that is slow?

When I tried to use @st.cache_data() on a helper function that returns the style object, it gives this error:
AttributeError: Can’t pickle local object ‘StylerRenderer.init..’

If you have another way to take away the thousand separator, let me know please. It’s not a big deal, but it shows up at fields like zip code or person id numbers. I might have to just show a plain dataframe.

Thanks

1 Like

You can use cache_resource instead. There is little risk that you accidentally mutate the return value.

I have the same issue with a 10 000 rows x 5 columns dataframe.

st.dataframe is quite fast with the raw dataframe (~ 1s to display the dataframe) whereas it takes about 15s to display the same dataframe stilized (a single line is highlighted).

I use _compute() as suggested here before calling st.dataframe. So it seems that it is the display part that is slow.

@wangp22 : did you find a solution ?


EDIT:

  • Found the bottleneck : the function streamlit.elements.lib.pandas_styler_utils._use_display_values parses the whole dataset, cell by cell, which can take a while
  • If you don’t need it (e.g. if you just want to highlight a row), you can probably override it :
    # Override a function to avoid slow display of pd.Style
    st.elements.lib.pandas_styler_utils._use_display_values = lambda df, style: df.astype(str)
    

EDIT 2:

  • The previous workaround also remove all the formatting …
  • One could probably improve _use_display_values implementation (the call to df.iat is the slow part)
2 Likes

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.