Plotly chart performance with datetime x-axis

I noticed plotly charts were kind of slow at rendering even with small amounts of data and decided to dive into it. Specifically I found that charts with datetimes on the x-axis were slow.

I started to make a minimum reproducible example and in doing so I found that, after saving my data to csv’s and reading from these (to pandas), the issue stopped. What i then found was that if the x-axis column in my pandas dataframe was of the object/string type, when generating the figure, then the streamlit rendering would be faster and apparently without loss of functionality in the plots.

I have found that the render time will be more than 10x faster when using string typed datetimes.

My question would be if this is expected, or if I am doing something weird in the first place. Furthermore is there any reason why i might not want to always do this when doing plotly charts? In any case i wanted to share this potential performance increase.

I include a small example which shows both the total time and the time just for rendering as the generation of the figure can easily be cached so some may primarily need the fast rendering.

Im running my app locally with Streamlit version 1.38.0 and Python version 3.12.4.

small example code:

from time import time

import pandas as pd
import plotly.graph_objects as go
import streamlit as st


def st_fig_show(fig):
    st.plotly_chart(fig, use_container_width=True)

def gen_fig_sample(convert_to_string_dates):
    data1 = pd.DataFrame({'date': pd.date_range(start='2020-01-01', end='2024-06-01', freq='h'), 'value': 1})
    data2 = pd.DataFrame({'date': pd.date_range(start='2020-01-01', end='2024-06-01', freq='h'), 'value': 2})
    if convert_to_string_dates:
        data1.date = data1.date.astype(str)
        data2.date = data2.date.astype(str)
    fig = go.Figure()

    fig.add_trace(
        go.Scatter(
            x=data1.date, y=data1.value,
            mode='lines',
            name='data_value'
        )
    )
    fig.add_trace(
        go.Scatter(
            x=data2.date, y=data2.value,
            mode='lines',
            name='data_value2'
        )
    )
    return fig


ts = time()
fig_sample_string = gen_fig_sample(convert_to_string_dates=True)
st.write("generating figure using string-datetimes: %2.4f seconds" % (time() - ts))

ts1 = time()
st_fig_show(fig_sample_string)
st.write("rendering using string-datetimes: %2.4f seconds" % (time() - ts1))
st.write("total for generating and rendering string-datetimes: %2.4f seconds" % (time() - ts))

ts2 = time()
fig_sample_datetime = gen_fig_sample(convert_to_string_dates=False)
st.write("generating figure using datetimes: %2.4f seconds" % (time() - ts2))

ts3 = time()
st_fig_show(fig_sample_datetime)
st.write("rendering using datetimes: %2.4f seconds" % (time() - ts3))
st.write("total for generating and rendering datetimes: %2.4f seconds" % (time() - ts2))

That is interesting. I do not have an answer for the reason why that happens, but I was able to replicate your findings so I thought it would be good to report. Indeed st.plotly_chart takes 10x longer to render a figure with datetimes.

Code
import functools
import time

import streamlit as st
import pandas as pd
import plotly.express as px
from plotly.graph_objects import Figure


def timer(func):
    """Decorator to time a function execution.
    See: https://realpython.com/python-timer/#creating-a-python-timer-decorator
    """

    @functools.wraps(func)
    def wrapper_timer(*args, **kwargs):
        tic = time.perf_counter()
        value = func(*args, **kwargs)
        toc = time.perf_counter()
        elapsed_time = toc - tic
        st.write(f"Elapsed time: **{elapsed_time:0.4f} seconds**")
        return value

    return wrapper_timer


@st.cache_resource
def generate_dataframe(convert_to_string_dates: bool) -> pd.DataFrame:
    """Generate a dataframe with datetimes and values. The dataframe is cached
    to avoid re-generating it every time the app is run.
    """

    df = pd.DataFrame(
        {"date": pd.date_range(start="2015-01-01", end="2024-06-01", freq="h")}
    )

    df["value"] = df["date"].apply(lambda x: x.month)

    if convert_to_string_dates:
        df["date"] = df["date"].astype(str)

    return df


@st.cache_resource
def generate_figure(df: pd.DataFrame) -> Figure:
    """Generate a figure with a line chart using the dataframe. The figure is
    cached to avoid re-generating it every time the app is run.
    """
    fig = px.line(df, x="date", y="value")
    return fig


@timer
def st_fig_show(fig: Figure) -> None:
    """Render a plotly figure using streamlit. It will print the elapsed time
    of the function execution, which is only the time it takes Streamlit to
    render the figure.
    """
    st.plotly_chart(fig, use_container_width=True)


def main():
    cols = st.columns(2)

    with cols[0]:
        "## Render figure using datetimes"
        df_datetimes = generate_dataframe(convert_to_string_dates=False)
        dtypes_str = "\n"
        for label, dtype in df_datetimes.dtypes.items():
            dtypes_str += f"- {label}: `{dtype}`\n"
        f"**Data types** {dtypes_str}"
        f"Size of dataframe: {df_datetimes.memory_usage().sum() / 1024:.2f} KB"

        figure = generate_figure(df_datetimes)
        st_fig_show(figure)

    with cols[1]:
        "## Render figure using string-datetimes"
        df_strings = generate_dataframe(convert_to_string_dates=True)
        dtypes_str = "\n"
        for label, dtype in df_strings.dtypes.items():
            dtypes_str += f"- {label}: `{dtype}`\n"
        f"**Data types** {dtypes_str}"
        f"Size of dataframe: {df_strings.memory_usage().sum() / 1024:.2f} KB"

        figure = generate_figure(df_strings)
        st_fig_show(figure)

    if st.button("Rerun `st.plotly_chart`", use_container_width=True):
        st.rerun()


if __name__ == "__main__":
    main()


I would guess that Streamlit makes a copy of the data contained in the plotly Figure, transforms the datetime data into strings (to make a valid JSON), add other custom bits to that JSON, and then passes that to plotly.

------- df_datetimes -------
         2978924 function calls (2813219 primitive calls) in 1.480 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.507    0.253    0.709    0.355 {method '__deepcopy__' of 'numpy.ndarray' objects}
 165549/0    0.213    0.000    0.000          copy.py:118(deepcopy)
    82539    0.127    0.000    0.440    0.000 utils.py:85(default)
    82537    0.079    0.000    0.079    0.000 {method 'isoformat' of 'datetime.datetime' objects}
   578731    0.070    0.000    0.070    0.000 {method 'get' of 'dict' objects}

PS:

I tested converting the Plotly Figure to HTML and render it using streamlit.components.v1.html. The performance difference is still the same so perhaps the issue is not within Streamlit but within Plotly.

4 Likes

Thanks for the reply and for your time looking into this. I couldn’t quite figure out how to test if it was plotly alone causing this so thanks alot.

While this may be caused by plotly I probably do not care as much about performance, when working plotly in other contexts as I do when creating a dashboard, where the responsiveness can really be felt by the user. So even though this may not be (probably is not) a bug, I hope that this post may help others optimize their dashboards.

One further note is that I was working on my dashboard in pycharm using the debugger, and here the issue becomes much worse (string-typed performance will be more than 100x compared to the 10x found here). I know this is not as widely interesting, but I just wanted to mention it as it may trick others (as it did me) into thinking that their app is slower than it really is when deployed.

2 Likes

You should probably open an issue here: Issues · plotly/plotly.py · GitHub