Scatter plot is very slow

BSDevo · October 27, 2023, 10:44pm

Hi, is there a way to speed up my scatter plots ?
I have 5 of them and in total i will have around 10 but its slow.
If i use small file up to 1k rows in csv and 25 columns its fast,

    elif st.session_state.type_filter=='Scatter Plot':

        fig = px.scatter(
            filtered_df,
            x="COL1",
            y="COL2",
            color="COL1",
            color_continuous_scale="reds"
        )
        st.plotly_chart(fig, theme="streamlit", use_container_width=True)

With 22k rows it shows me 0.7421250219573975 - took to load but it takes almost 2 to 4 seconds as i can see chart from previous click ( state session with radio button ) and i see how it fades away and scatter plot appears.
My scatter plots are in rows , no tabs, no columns.
With 200k it takes around 3.25 but it takes another double of that time sometimes triple to load and pages sometimes gets less responsive or non responsive for a moment until everything is loaded.
And more plots i add - the slower it gets.
Does plotting affect speed and performance of streamlit app ( line chart with 200k rows takes ages to load but code is only few words) and does in-row code writing affects streamlit performance ?
i mean like this:

    elif st.session_state.type_filter=='Scatter Plot':

        fig = px.scatter(
            filtered_df,
            x="COL1",
            y="COL2",
            color="COL1",
            color_continuous_scale="reds"
        )
        st.plotly_chart(fig, theme="streamlit", use_container_width=True)

VS

    elif st.session_state.type_filter=='Scatter Plot':
        fig = px.scatter(filtered_df, x="COL1", y="COL2", color="COL1",         color_continuous_scale="reds" )
        st.plotly_chart(fig, theme="streamlit", use_container_width=True)

Thank You.

Charly_Wargnier · October 28, 2023, 9:31am

Hello @BSDevo!

Thank you for your question.

I can’t see the full code, but if you’re performing data transformations before plotting, you should use the st.cache_data decorator. This ensures that such operations are not repeated unnecessarily and should speed up the display of your scatter plots.

Let me know how it goes. I’m happy to review the code too if you like

Best,
Charly

BSDevo · October 28, 2023, 1:20pm

I do have some data transformation but its within @st.cache
One thing is not in @st.cache is filtering

filtering is set to filtered_df and my scatter, graphs, map etc is dynamically changes on this filtered_df.
And i have 10 radio buttons mapped to st.session_state and 8 of them are tied to filtered_df.
This is my scatter plot code:


    elif st.session_state.type_filter=='Scatter Plot':

        fig = px.scatter(
            filtered_df,
            x="RPM",
            y="Weight",
            color="Price",
            color_continuous_scale="gnbu"
        )
        st.plotly_chart(fig, theme="streamlit", use_container_width=True)

and one more example:

    elif st.session_state.type_filter=='Filtered Time Series':
        st.subheader("Filtered Weekly Earnings")
        
        linechartwrate = pd.DataFrame(filtered_df.groupby(filtered_df["week_year"].dt.strftime("%Y : %U"))["Rate"].sum()).reset_index()
        laiko_juosta_savaite = px.bar(linechartwrate, x = "week_year", y="Rate", labels={"Rate": "Amount"}, height=600, width = 1000, template="gridon")
        st.plotly_chart(laiko_juosta_savaite, use_container_width=True)
        csv = linechartwrate.to_csv(index=False).encode('utf-8')
        st.download_button("Download Time Series Weekly Data", data=csv, file_name="TimeSeriesWeekly.csv", mime="text/csv", help='Click here to download the file as a CSV file')
        with st.expander("View Time Series Weekly Data"):
            st.write(linechartwrate.T.style.background_gradient(cmap="Blues"))

        # Time Series Monthly
        st.subheader("Filtered Monthly Earnings")
        df["month_year"] = df["PuDate"].dt.to_period("M")

        linechartrate = pd.DataFrame(filtered_df.groupby(filtered_df["month_year"].dt.strftime("%Y : %m"))["Rate"].sum()).reset_index()
        laiko_juosta_menuo = px.bar(linechartrate, x = "month_year", y="Rate", labels={"Rate": "Amount"}, height=600, width = 1000, template="gridon")
        st.plotly_chart(laiko_juosta_menuo, use_container_width=True)
        csv = linechartrate.to_csv(index=False).encode('utf-8')
        st.download_button("Download Time Series Monthly Data", data=csv, file_name="TimeSeriesMonthly.csv", mime="text/csv", help='Click here to download the file as a CSV file')
        with st.expander("View Time Series Monthly Data"):
            st.write(linechartrate.T.style.background_gradient(cmap="Blues"))

I hope this will help.

Charly_Wargnier · October 28, 2023, 2:09pm

I see. I would definitely cache the data filtering function.

The new st.cache_data decorator is used for caching functions that return data, such as dataframes, text, or computations with basic types.

You could try something along these lines:

import streamlit as st
import pandas as pd
import plotly.express as px

# Cache the data loading function using st.cache_data
@st.cache_data
def load_data():
    # Load your data here
    return df

# Cache the data filtering function using st.cache_data
@st.cache_data
def filter_data(df, filter_conditions):
    # Apply your filtering logic here based on the filter_conditions
    filtered_df = df[filter_conditions]
    return filtered_df

# Load the data (this will be cached)
data = load_data()

# Define your filter conditions based on user input or other criteria
filter_conditions = ...  

# Filter the data (this will be cached)
filtered_df = filter_data(data, filter_conditions)

# Now use the filtered_df for your plotting
if st.session_state.type_filter == 'Scatter Plot':
    fig = px.scatter(
        filtered_df,
        x="RPM",
        y="Weight",
        color="Price",
        color_continuous_scale="gnbu"
    )
    st.plotly_chart(fig, theme="streamlit", use_container_width=True)

I hope this helps - let me know how it goes.

Best wishes,
Charly

ashepelin · November 2, 2023, 8:00am

Some ideas:

Pandas may be much faster with DateTime index in .loc[] and .query():
df.loc[startDate:endDate].query(" Weight<=svoris[1] & Miles>=mylios[0] ")
Several controls may be organized in Form with one button for executing.
Most fastest querying in Pandas - with ready bool-type columns:
df.query(" colname1 and colname3 ")

lavint · February 25, 2024, 11:44pm

Can you please help me understand what is the point to use st.cache_data when the data is filtered with drop down options? Whenever the user selects a new value in the drop down, the whole script will be ran. I may have understood it incorrectly and it seems to me that caching the dynamic data unnecessarily increases memory usage and slow down performance.

system · August 23, 2024, 11:45pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow and instable app when using plotly charts Using Streamlit	2	1956	March 5, 2024
Plotly Performance Issues Despite Caching Using Streamlit pandas , plotly , debugging	8	98	April 27, 2025
Data loading is very slow Using Streamlit	3	1674	May 13, 2022
Streamlit being bit Slow on procedural programming Using Streamlit	5	1103	January 12, 2022
Plotly chart performance with datetime x-axis Using Streamlit plotly , discussion	5	534	April 25, 2025

Scatter plot is very slow

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies