Context:
Working on 20 to 30 multi page streamlit dashboard. Hence speed is becoming critical.
Currently running app locally. Cannot share the link of the dashboard.
import duckdb
import pandas as pd
import polars as pl
import time
conn = duckdb.connect()
# dummy.csv refers to a file that I created with 100 million rows for testing.
3 gb dataset.
query = \
"""
CREATE TABLE dummy as (
select * from 'dummy.csv'
)
"""
# Creates table
conn.execute(query)
# Part 1: Testing with Duck DB show()
start_time = time.time()
query = \
"""
select *
from dummy
"""
df = conn.sql(query).show()
df
print("--- %s seconds ---" % (time.time() - start_time))
Took my machine 0.013511896133422852 seconds. (But I cannot feed this dataframe to my streamlit application)
So currently I am converting it to either pandas or polars, which have been slower than duckdb and I am not liking the workflow.
Hello @napster_8l, and welcome to the Streamlit Community!
Two things come to mind to improve performance: first, you might consider optimizing your DuckDB queries, which includes using proper indexing, avoiding “select *” when not necessary, and pre-filtering data as much as possible.
You could also use DuckDB’s capability to directly query CSV files without loading them into a table if the data doesn’t need to persist.
Thank you for noticing the intricate details like those.
I get that I shouldn’t be doing the
select * and writing efficient queries, etc. But it was just done testing out where exactly am i being inefficient. And I found out that its when I am converting the aggregated results into pandas dataframe and in my use case there are simply alot of such lines in code where i was just converting to pandas dataframes.
I just came here to learn IF there are any efficient ways of display dataframes without having to convert to polars or arrow table or pandas dataframe.
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.