Scatter plot is very slow

Hi, is there a way to speed up my scatter plots ?
I have 5 of them and in total i will have around 10 but its slow.
If i use small file up to 1k rows in csv and 25 columns its fast,

    elif st.session_state.type_filter=='Scatter Plot':

        fig = px.scatter(
            filtered_df,
            x="COL1",
            y="COL2",
            color="COL1",
            color_continuous_scale="reds"
        )
        st.plotly_chart(fig, theme="streamlit", use_container_width=True)

With 22k rows it shows me 0.7421250219573975 - took to load but it takes almost 2 to 4 seconds as i can see chart from previous click ( state session with radio button ) and i see how it fades away and scatter plot appears.
My scatter plots are in rows , no tabs, no columns.
With 200k it takes around 3.25 but it takes another double of that time sometimes triple to load and pages sometimes gets less responsive or non responsive for a moment until everything is loaded.
And more plots i add - the slower it gets.
Does plotting affect speed and performance of streamlit app ( line chart with 200k rows takes ages to load but code is only few words) and does in-row code writing affects streamlit performance ?
i mean like this:

    elif st.session_state.type_filter=='Scatter Plot':

        fig = px.scatter(
            filtered_df,
            x="COL1",
            y="COL2",
            color="COL1",
            color_continuous_scale="reds"
        )
        st.plotly_chart(fig, theme="streamlit", use_container_width=True)

VS

    elif st.session_state.type_filter=='Scatter Plot':
        fig = px.scatter(filtered_df, x="COL1", y="COL2", color="COL1",         color_continuous_scale="reds" )
        st.plotly_chart(fig, theme="streamlit", use_container_width=True)

Thank You.

Hello @BSDevo! :wave:

Thank you for your question.

I can’t see the full code, but if you’re performing data transformations before plotting, you should use the st.cache_data decorator. This ensures that such operations are not repeated unnecessarily and should speed up the display of your scatter plots.

Let me know how it goes. I’m happy to review the code too if you like :slight_smile:

Best,
Charly

I do have some data transformation but its within @st.cache
One thing is not in @st.cache is filtering

filtering is set to filtered_df and my scatter, graphs, map etc is dynamically changes on this filtered_df.
And i have 10 radio buttons mapped to st.session_state and 8 of them are tied to filtered_df.
This is my scatter plot code:


    elif st.session_state.type_filter=='Scatter Plot':

        fig = px.scatter(
            filtered_df,
            x="RPM",
            y="Weight",
            color="Price",
            color_continuous_scale="gnbu"
        )
        st.plotly_chart(fig, theme="streamlit", use_container_width=True)

and one more example:

    elif st.session_state.type_filter=='Filtered Time Series':
        st.subheader("Filtered Weekly Earnings")
        
        linechartwrate = pd.DataFrame(filtered_df.groupby(filtered_df["week_year"].dt.strftime("%Y : %U"))["Rate"].sum()).reset_index()
        laiko_juosta_savaite = px.bar(linechartwrate, x = "week_year", y="Rate", labels={"Rate": "Amount"}, height=600, width = 1000, template="gridon")
        st.plotly_chart(laiko_juosta_savaite, use_container_width=True)
        csv = linechartwrate.to_csv(index=False).encode('utf-8')
        st.download_button("Download Time Series Weekly Data", data=csv, file_name="TimeSeriesWeekly.csv", mime="text/csv", help='Click here to download the file as a CSV file')
        with st.expander("View Time Series Weekly Data"):
            st.write(linechartwrate.T.style.background_gradient(cmap="Blues"))

        # Time Series Monthly
        st.subheader("Filtered Monthly Earnings")
        df["month_year"] = df["PuDate"].dt.to_period("M")

        linechartrate = pd.DataFrame(filtered_df.groupby(filtered_df["month_year"].dt.strftime("%Y : %m"))["Rate"].sum()).reset_index()
        laiko_juosta_menuo = px.bar(linechartrate, x = "month_year", y="Rate", labels={"Rate": "Amount"}, height=600, width = 1000, template="gridon")
        st.plotly_chart(laiko_juosta_menuo, use_container_width=True)
        csv = linechartrate.to_csv(index=False).encode('utf-8')
        st.download_button("Download Time Series Monthly Data", data=csv, file_name="TimeSeriesMonthly.csv", mime="text/csv", help='Click here to download the file as a CSV file')
        with st.expander("View Time Series Monthly Data"):
            st.write(linechartrate.T.style.background_gradient(cmap="Blues"))

I hope this will help.

I see. I would definitely cache the data filtering function.

The new st.cache_data decorator is used for caching functions that return data, such as dataframes, text, or computations with basic types.

You could try something along these lines:

import streamlit as st
import pandas as pd
import plotly.express as px

# Cache the data loading function using st.cache_data
@st.cache_data
def load_data():
    # Load your data here
    return df

# Cache the data filtering function using st.cache_data
@st.cache_data
def filter_data(df, filter_conditions):
    # Apply your filtering logic here based on the filter_conditions
    filtered_df = df[filter_conditions]
    return filtered_df

# Load the data (this will be cached)
data = load_data()

# Define your filter conditions based on user input or other criteria
filter_conditions = ...  

# Filter the data (this will be cached)
filtered_df = filter_data(data, filter_conditions)

# Now use the filtered_df for your plotting
if st.session_state.type_filter == 'Scatter Plot':
    fig = px.scatter(
        filtered_df,
        x="RPM",
        y="Weight",
        color="Price",
        color_continuous_scale="gnbu"
    )
    st.plotly_chart(fig, theme="streamlit", use_container_width=True)

I hope this helps - let me know how it goes.

Best wishes,
Charly

1 Like

Some ideas:

  1. Pandas may be much faster with DateTime index in .loc[] and .query():
    df.loc[startDate:endDate].query(" Weight<=svoris[1] & Miles>=mylios[0] ")

  2. Several controls may be organized in Form with one button for executing.

  3. Most fastest querying in Pandas - with ready bool-type columns:
    df.query(" colname1 and colname3 ")

Can you please help me understand what is the point to use st.cache_data when the data is filtered with drop down options? Whenever the user selects a new value in the drop down, the whole script will be ran. I may have understood it incorrectly and it seems to me that caching the dynamic data unnecessarily increases memory usage and slow down performance.