I’m using pandas, altair, and streamlit in my app. I’m reading in csv files that can get pretty large (100,000ish lines) using pandas.read_csv(…) and then I’m graphing the data using streamlit.altair_chart() with the altair chart that I create using the pandas dataframe. I’ve done some profiling that says the function takes about 0.01 seconds to complete, but when I open the app the page is super frozen and it takes a good 20 seconds for the graph to finally show up and the whole page is still really glitchy. The data is constantly changing, so I update the graph with new data every second.
Does anyone have any tips for how to graph large amounts of data that is constantly updating? Would using a different graphing library like plotly be any faster?
I’m running locally, python version == 3.9.18, streamlit version == 1.30.0
Here are some ideas, but I’m sure there are experts who can give you an effective solution.
1.- Use of Dask :
Consider using Dask to work with large data sets more efficiently.
2.- Use of Duckdb:
con = duckdb.connect(database=':memory:', read_only=False)
con.register('df_altair', df_altair)
# Cargar DataFrames en DuckDB
con = duckdb.connect(database=':memory:', read_only=False)
con.register('df_altair', df_altair)
# consulta SQL en DuckDB para obtener df_filter
query_filter = f"""
SELECT * FROM df_altair
"""
df_filter = con.execute(query_filter).fetchdf()
3.- Optimize the CSV File Reading:
Use the usecols parameter of pandas.read_csv() to read only the necessary columns.
Use the nrows parameter to read only a limited number of rows at the beginning.
data = pd.read_csv('your_file.csv', usecols=['column1', 'column2'], nrows=1000)
@Oscar1 is spot-on for the third advice of reading in specific columns. Often times, this can significantly reduce the memory load of the DataFrame object that is created when typically we may only need 2-3 columns for plot creation from the possible number of tens or hundreds of columns.
Additionally, you can also check out these blogs on the topic of building performant apps or optimizing apps:
Hi! Thanks for the advice. My data already has the necessary rows (2 and 3 rows, respectively). It really seems like it’s the rendering in the browser that’s the issue. Do you know of any ways to speed that up?
Thank you for your advice! I’m going to definitely try Dask, although I believe my issue lies more on the browser rendering time than the collection of data time, but we’ll see!
What chart type are you rendering, I think it would most likely be slow probably because of rendering large volume of individual data points on a plot like that of a scatter plot. I’d recommend to try out various data visualization libraries to see which one renders the fastest.
Performing pivots or restructuring the data prior to chart generation may improve the speed. I think I saw some examples from Jake VanderPlas somewhere (can’t remember at the moment where). There’s a book that Jake wrote that may potentially provide some examples (Python Data Science Handbook | Python Data Science Handbook).
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.