Creating a streamlit app that refreshes every day

Hi there,

I am doing a classic Twitter project with the main goal of classifying hate speech Tweets.

I am trying to build a streamlit app that basically showcases the result of a trained model on newly collected fresh tweets.

So I have everything in a single notebook that pull relevant tweets from a MySQL database, makes predictions and a matplotlib figure with all the stats.

I can run this manually each time and get the graph displayed on Streamlit local but how would I automate the jupyter notebook to run and update the streamlit app on a daily basis at a certain time?

I am thinking of a very long sequence of local ‘cron’ commands but there must be an easier way to do this.

Also, once I manage to run the script daily on the local streamlit, how do I automate the deployment so that the Heroku online app also get updated on daily basis?

Thanks,
Matteo

Long sequence of ‘cron’ commands also involves using the nbconvert to transform the jupyter notebooks each time into a .py file. For the time being, I am doing this manually each time.

I wouldn’t mind using cron to convert each time .ipynb to .py, the main question still remains how do I make streamlit run on an updated version of the .py file every 24 hours.

Subsequently, how do I automate the communication with heroku every time Streamlit updates?

Streamlit will notice any changes in the script that it is running and use the current version without you having to change anything else.

How you update the script file is a different (and non-streamlit specific) question.
It depends a lot on what you are doing, but certainly converting the Jupiter notebook to a script, checking it works as a standalone script, running it via cron to update whatever it’s updating, and then feeding into or replacing the streamlit script file should just work fine.

I’m not sure that I’m clear about your workflow though, so shout out if I’m making bad assumptions.

Another way of achieving this is to split it.

Have something do the heavy pre-calculations offline (if they only take a few seconds, do it straight in the app). Push the precalculated data to S3 or your storage service / location of choice. You can turn a notebook into a script if you want.

In your streamlit app, read this in, caching the result with a time to live (TTL) of a few hours or some generally long timescale.

Now your app is just reading a file and running, and reading in that file only a few times a day no matter what traffic you see.

Tldr- build your model with a cron job and push to storage somewhere. Read it in inside a cached function in your streamlit app.

You could use Apache Airflow (suggestion of a data engineer, might be overkill!)

1 Like

Well, I found Prefect to pretty much fit my needs in terms of engineering small data pipelines in place of successive cron calls, it’s a bit less demanding than Airflow.

Also check out Papermill if you really don’t want to convert your notebooks to .py files and want to parametrize your notebooks in successive CRON calls.

1 Like

@andfanilo going off topic here but I am very interested to hear about your use case/workflow involving streamlit & prefect. Also a big fan of Jupyter so interested in papermill

If you want to create a data engineering topic in Random I bet you’d get some good discussion (including from me :slight_smile: )

1 Like

I created the thread Data engineering best practices