Running ETL Jobs using Streamlit

Hello Streamlit Community,

I wanted to share one use case of how I’ve got Streamlit to help me in my day to day job.
I have a number of jobs created on the ETL Server (MSSQL) and these jobs need to be executed manually on demand. While I like developing solutions, I do not like to manually run processes, which is why I was looking for a simple way to make this “manual job execution” self-serve and hand it off to a non-programmer.

Enter Streamlit. Using < 100 lines of python code I created a simple app which allows the user to run jobs across multiple clients and view the progress of any job.

If anyone has a similar need and would like more details/code, please let me know.

Thanks!

5 Likes

Would be great to see a screenshot of how you implemented this, cheers!

Hi Negmat
This is interesting, one of my colleagues has a need for a similar application and a few more details would be a great starting point for him.
Kevin

Hi Rob and Kevin,

Would it help if I provided full code (just copy/paste into here)?

Thanks!

1 Like

Hi Kevin,

Here are the design details: We develop ETL processes in SSIS. SSIS Projects are deployed to SSISDB Database on the ETL Server. SQL Server Agent jobs are created to run a master SSIS package (within the SSIS Project) which in turn runs all the other SSIS packages. As the jobs run, in addition to doing data processing, there is also audit information written to the database.

I use streamlit to essentially run these jobs and view their progress. I have a configuration file in excel which contains information on clients and corresponding jobs (along with SQL Server connection information). Configuration information feeds the drop downs so a user can choose which jobs to start or stop. We have a view sitting on top of the audit tables and this is what is used to view job progress (essentially showing which SSIS packages have run, in progress, or errorred out).

I am happy to provide more details or code.

Thank you!

1 Like

I would love to see the code! :eyes:

Hi Sergio,

Below is the code:


import streamlit as st

import pandas as pd

import sqlalchemy

import urllib.parse

general_folder = “C:\\Web Apps”

config_file_name = “Client Data Processing Configuration.xlsx”

config_file_location = general_folder + ‘\’ + config_file_name

Load config file into the dataframe

xls = pd.ExcelFile(config_file_location)

config_df = pd.read_excel(xls, “Jobs”)

st.title(“Self Serve Data Processing”)

Identify a client

client_list = list(set(config_df[“Client Name”]))

client = st.selectbox(“Client”, client_list)

Identify the client job to run

config_df_client = config_df[config_df[“Client Name”] == client]

data_processing_jobs_for_client = list(config_df_client[“Data Processing Job”])

data_processing_job = st.selectbox(“Data processing Job”, data_processing_jobs_for_client)

Identify client and etl database server and database

client_db_server = list(config_df_client[“Database Server”])[0]

client_database = list(config_df_client[“Database Name”])[0]

etl_db_server = list(config_df_client[“ETL Server”])[0]

etl_database = “msdb”

def get_connection(db_server, database):

# Database connection

connection_string = (

    r'Driver={SQL Server Native Client 11.0};'

    r'Server=' + db_server + ';'

    r'Database=' + database + ';'

    r'Trusted_Connection=yes;'

    )

params = urllib.parse.quote(connection_string)

engine = sqlalchemy.create_engine(

    "mssql+pyodbc:///?odbc_connect=%s" % params

    )

connection = engine.raw_connection()

return connection

client_connection = get_connection(client_db_server, client_database)

etl_connection = get_connection(etl_db_server, etl_database)

if st.button(“Start Job”):

etl_connection.execute(

    "SET NOCOUNT ON; EXEC sp_start_job N'"+data_processing_job

    + "'; SET NOCOUNT OFF"

)

st.success("The job has started successfully")

if st.button(“Stop Job”):

etl_connection.execute(

    "SET NOCOUNT ON; EXEC sp_stop_job N'"+data_processing_job

    + "'; SET NOCOUNT OFF"

)

st.success("The job has stopped successfully")

if st.button(“View Job Progress”):

sql = "SELECT * FROM [audit].[PackageStatusForLatestBatch] ORDER BY [Package Start Time]"

audit_view = pd.read_sql_query(sql, client_connection)

st.dataframe(audit_view)

Things to note:

  1. PackageStatusForLatestBatch is a view built on top of the audit tables which shows SSIS package run progress. Those audit tables are populated by the SSIS packages. If you’d like to know how that works, it is a separate discussion on SSIS Management Framework. Alternatively, you can replace with your own way of identifying/showing progress.

  2. There is a configuration excel file (“Client Data Processing Configuration.xlsx”), which contains information on clients and jobs per client as well as database server information. This excel file has 1 tab (Jobs), and within this tab the columns are:

Hope this helps.

Any questions, let me know.

Thanks!

3 Likes

Thanks for sharing! this is great