Is there a way to speed up rendering time with Altair?

I’m using pandas, altair, and streamlit in my app. I’m reading in csv files that can get pretty large (100,000ish lines) using pandas.read_csv(…) and then I’m graphing the data using streamlit.altair_chart() with the altair chart that I create using the pandas dataframe. I’ve done some profiling that says the function takes about 0.01 seconds to complete, but when I open the app the page is super frozen and it takes a good 20 seconds for the graph to finally show up and the whole page is still really glitchy. The data is constantly changing, so I update the graph with new data every second.

Does anyone have any tips for how to graph large amounts of data that is constantly updating? Would using a different graphing library like plotly be any faster?

I’m running locally, python version == 3.9.18, streamlit version == 1.30.0

Thanks!

-Alyssa

Hello Alyssa

Here are some ideas, but I’m sure there are experts who can give you an effective solution.

1.- Use of Dask :
Consider using Dask to work with large data sets more efficiently.

2.- Use of Duckdb:

      con = duckdb.connect(database=':memory:', read_only=False)

        con.register('df_altair', df_altair)

        # Cargar DataFrames en DuckDB
        con = duckdb.connect(database=':memory:', read_only=False)
        con.register('df_altair',  df_altair)
        # consulta SQL en DuckDB para obtener df_filter
        query_filter = f"""
        SELECT * FROM df_altair
        """
        df_filter = con.execute(query_filter).fetchdf()

3.- Optimize the CSV File Reading:
Use the usecols parameter of pandas.read_csv() to read only the necessary columns.
Use the nrows parameter to read only a limited number of rows at the beginning.

data = pd.read_csv('your_file.csv', usecols=['column1', 'column2'], nrows=1000)

Óscar.

1 Like

Hi @AlCote

@Oscar1 is spot-on for the third advice of reading in specific columns. Often times, this can significantly reduce the memory load of the DataFrame object that is created when typically we may only need 2-3 columns for plot creation from the possible number of tens or hundreds of columns.

Additionally, you can also check out these blogs on the topic of building performant apps or optimizing apps:

Hope this helps!

Hi! Thanks for the advice. My data already has the necessary rows (2 and 3 rows, respectively). It really seems like it’s the rendering in the browser that’s the issue. Do you know of any ways to speed that up?

Thank you for your advice! I’m going to definitely try Dask, although I believe my issue lies more on the browser rendering time than the collection of data time, but we’ll see!

Hi @AlCote

What chart type are you rendering, I think it would most likely be slow probably because of rendering large volume of individual data points on a plot like that of a scatter plot. I’d recommend to try out various data visualization libraries to see which one renders the fastest.

Performing pivots or restructuring the data prior to chart generation may improve the speed. I think I saw some examples from Jake VanderPlas somewhere (can’t remember at the moment where). There’s a book that Jake wrote that may potentially provide some examples (Python Data Science Handbook | Python Data Science Handbook).

Exporting the chart as a JSON and then reading it via vega-lite might also be another way to try (Saving Altair Charts — Altair 4.2.2 documentation)