Improving Data Load Performance

Hi,

I am a 3 day old streamlit user :slight_smile:

I have written python code in the past that loops through a folder and loads files into the database (MSSQL), using pandas dataframes in the process.

When I run this code outside of streamlit, it takes 5-10 minutes to load all the files (4 files, 100 MB each). When I run this code via streamlit (press a button to execute a function), it takes 30+ minutes.

I’m curious to know why this is the case (what part of streamlit architecture might makes this process take much longer on large files) and whether there is a way to speed things up.

Please advise.

Thank you in advance!

2 Likes

Hi,

I wanted to share the code I use to connect to the database. Perhaps it will shed some light on the issue. I recall that before adding @event.listens_for code the data load through python was very slow as well, but improved significantly after adding it.

Via streamlit it is slow again. Perhaps streamlit is not executing this part of code?

Please advise

Thank you!

import sqlalchemy
import pyodbc
import urllib.parse
from sqlalchemy import event, create_engine


# Database connection

# Adding MARS_Connection=yes is important as it allows for multiple active result
# sets.  Otherwise, we could not execute multiple qa stored procedures
connection_string = (
    r'Driver={SQL Server Native Client 11.0};'
    r'Server=Replace_ServerName;'
    r'Database=Replace_Database_Name;'
    r'Trusted_Connection=yes;'
    r'MARS_Connection=yes;'
)

params= urllib.parse.quote(connection_string)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# create_engine() is once per particular database URL, held globally for the lifetime of a single application process.
# The Engine is not synonymous to the DBAPI connect function, which represents just one connection resource -
# the Engine is most efficient when created just once at the module level of an application, not per-object or per-function cal

# The code below makes the data load much much faster
@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

connection = engine.raw_connection()
2 Likes

Hi @Negmat_Mullodzhanov, thanks for your question!

I’m not totally sure what’s going on here, but if I were you I’d place a statement like print("executing callback with executemany=", executemany) inside the receive_before_cursor_execute function as a debug to see (1) if the callback is being executed and (2) if the executemany boolean is getting set to True as you expect.

Can you try that out and see?