Data engineering best practices

Starting this thread to discuss Data engineering best practices incorporating Streamlit. @andfanilo has experience using Prefect, and @randyzwitch has some ideas on best practice

It’s definitely an interesting question I’ll be interested to see evolve, meaning, what are some good ways to split the responsibility between Streamlit as a presentation layer vs heavier lifting in data engineering tools

1 Like

Just thinking out loud here…

I wonder if you could save cached computations away onto disk so the client can use the same already completed computations/model as the researcher (like training a model)

Otherwise, the researcher needs a streamlit app to perform modeling computations Jupyter notebook style and then needs to take their model and create a client application to demonstrate the model on less powerful machines that other stakeholders may be using.

Maybe application states could be used to allow the researcher to work in the app in a development capacity where the app locally runs heavy computation while writing models to the disk/git directory so that the client mode of the app can be run on a less powerful computer or on a low power web server. After all, getting a result from a trained model is 1000x+ less computationally intense than training the model.

You could always rebuild the model from the development tab/screen, but otherwise, the model is already prepared, reducing latency for those who want to toy with a model.

I’ve seen some people using pickle to have models available to users.