Improving Data Load Performance

Negmat_Mullodzhanov · November 23, 2019, 1:26am

Hi,

I am a 3 day old streamlit user

I have written python code in the past that loops through a folder and loads files into the database (MSSQL), using pandas dataframes in the process.

When I run this code outside of streamlit, it takes 5-10 minutes to load all the files (4 files, 100 MB each). When I run this code via streamlit (press a button to execute a function), it takes 30+ minutes.

I’m curious to know why this is the case (what part of streamlit architecture might makes this process take much longer on large files) and whether there is a way to speed things up.

Please advise.

Thank you in advance!

Negmat_Mullodzhanov · November 25, 2019, 8:59pm

Hi,

I wanted to share the code I use to connect to the database. Perhaps it will shed some light on the issue. I recall that before adding @event.listens_for code the data load through python was very slow as well, but improved significantly after adding it.

Via streamlit it is slow again. Perhaps streamlit is not executing this part of code?

Please advise

Thank you!

import sqlalchemy
import pyodbc
import urllib.parse
from sqlalchemy import event, create_engine


# Database connection

# Adding MARS_Connection=yes is important as it allows for multiple active result
# sets.  Otherwise, we could not execute multiple qa stored procedures
connection_string = (
    r'Driver={SQL Server Native Client 11.0};'
    r'Server=Replace_ServerName;'
    r'Database=Replace_Database_Name;'
    r'Trusted_Connection=yes;'
    r'MARS_Connection=yes;'
)

params= urllib.parse.quote(connection_string)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# create_engine() is once per particular database URL, held globally for the lifetime of a single application process.
# The Engine is not synonymous to the DBAPI connect function, which represents just one connection resource -
# the Engine is most efficient when created just once at the module level of an application, not per-object or per-function cal

# The code below makes the data load much much faster
@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

connection = engine.raw_connection()

nthmost · November 27, 2019, 1:23am

Hi @Negmat_Mullodzhanov, thanks for your question!

I’m not totally sure what’s going on here, but if I were you I’d place a statement like print("executing callback with executemany=", executemany) inside the receive_before_cursor_execute function as a debug to see (1) if the callback is being executed and (2) if the executemany boolean is getting set to True as you expect.

Can you try that out and see?

Topic		Replies	Views
SQL Server Insert using SQLAlchemy in Python Using Streamlit discussion	1	349	November 4, 2024
Data loading is very slow Using Streamlit	3	1658	May 13, 2022
Optimizing Streamlit Application Using Streamlit cache , windows , session-state , pandas , plotly , streamlit-elements , discussion	4	350	November 14, 2024
Performance issue when loading module in Streamlit application Using Streamlit pandas , debugging	1	28	April 16, 2025
How to improve Streamlit app loading speed Show the Community!	1	1783	April 2, 2024

Improving Data Load Performance

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies