I have written python code in the past that loops through a folder and loads files into the database (MSSQL), using pandas dataframes in the process.
When I run this code outside of streamlit, it takes 5-10 minutes to load all the files (4 files, 100 MB each). When I run this code via streamlit (press a button to execute a function), it takes 30+ minutes.
I’m curious to know why this is the case (what part of streamlit architecture might makes this process take much longer on large files) and whether there is a way to speed things up.
I wanted to share the code I use to connect to the database. Perhaps it will shed some light on the issue. I recall that before adding @event.listens_for code the data load through python was very slow as well, but improved significantly after adding it.
Via streamlit it is slow again. Perhaps streamlit is not executing this part of code?
Please advise
Thank you!
import sqlalchemy
import pyodbc
import urllib.parse
from sqlalchemy import event, create_engine
# Database connection
# Adding MARS_Connection=yes is important as it allows for multiple active result
# sets. Otherwise, we could not execute multiple qa stored procedures
connection_string = (
r'Driver={SQL Server Native Client 11.0};'
r'Server=Replace_ServerName;'
r'Database=Replace_Database_Name;'
r'Trusted_Connection=yes;'
r'MARS_Connection=yes;'
)
params= urllib.parse.quote(connection_string)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# create_engine() is once per particular database URL, held globally for the lifetime of a single application process.
# The Engine is not synonymous to the DBAPI connect function, which represents just one connection resource -
# the Engine is most efficient when created just once at the module level of an application, not per-object or per-function cal
# The code below makes the data load much much faster
@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
cursor.commit()
connection = engine.raw_connection()
I’m not totally sure what’s going on here, but if I were you I’d place a statement like print("executing callback with executemany=", executemany) inside the receive_before_cursor_execute function as a debug to see (1) if the callback is being executed and (2) if the executemany boolean is getting set to True as you expect.
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.