Working with Jupyter Notebooks

My workflow often involves analyzing data and creating plots from within a Jupyter notebook (via matplotlib, plotly, seaborn, pandas). Most of the time, that involves a fig.show() command to render the figure I created within the notebook.

What is the suggested workflow to get from a Jupyter notebook to a working streamlit script? Right now I need to:

  1. Change all fig.show() calls to st.show(fig) in the notebook.
  2. Export the Notebook to a .py file
  3. Strip out any Jupyter magic commands
  4. Run streamlit on the exported .py file

Is it possible to use streamlit natively from within a notebook so that I don’t have to go through that process when moving from a notebook? Or is there an alternate suggested workflow for rapid/interactive development and analysis?

1 Like

Hi @Tom

If you prefer to use Jupyter when first analyzing data, then I think the steps you described above are indeed the right ones. Although I’d flip the order of (1) and (2), so you could write a script that processes the converted .py file with regexes to replace fig.show with st.show and remove %magic commands. Let me know if you need help putting together a script like that — it sounds like a fun project

That said, I suggest you try doing your initial analysis in Streamlit too :stuck_out_tongue_winking_eye: . It will require a bit more care in how you structure your code, but in my experience it works quite well!

5 Likes

Hi, my colleagues and I at holoviz.org recently spent some time analyzing how streamlit compares to the Jupyter-based workflows that we otherwise use and suggest for our users. As we understand it, streamlit focuses on a Python file leading to a single app or dashboard, and some of the tools we provide should work well with streamlit in that context (HoloViews, hvPlot, Datashader). But if you do want to switch seamlessly between the single-file,single-app case and a Jupyter notebook (many code cells, each with their own output and descriptions), you might be better off using Panel (panel.pyviz.org), or maybe Voila. Panel allows exactly the same code to work in a Jupyter notebook as in a separately deployed app. This support eliminates the friction and pain associated with moving between the notebook context (ideally suited to capturing and telling a readable story with your data and code) and the dashboard context (for sharing a runnable app). streamlit focuses on the latter, but if both are important, Panel seems more appropriate to us!

3 Likes

Hi @jbednar

Here are my few cents and experimences

  • The Streamlit approach is not a single file approach only.
    • You can do a single file approach and for getting started that is perfect and simple. But as your app grows you can refactor into any structure you would like folders/ file/ modules/packages. You just import whatever your need in your main app.py file.
    • So it’s just as Panel or Voila. They are not more suitable in that sense.
    • Hotreloading on deeply nested modules is not currently working though. See issue 366
  • The Streamlit Workflow is much, much more efficient than a Notebook based.
    • You work in an efficient editor
    • The iteration cycle of change-run-evaluate is so fast and automated in Streamlit.
    • You get (a) quality code file(s) and an app as end result,
    • When working together with my colleagues they say after having tried Streamlit that they will never use a Notebook again. It’s so much easier, efficient and productive.
    • Streamlit solves all the problems of Notebooks pointed out by Joel Gruss in I don’t like notebooks
    • The only reasons for staying in the notebook is the larger community, larger library of widgets and export to pdf functionality.
      • If you have other arguments for staying in the notebook please let me know as I cannot see them.
  • The product produced by Streamlit is much nicer than a Notebook.
    • You can control the output of your code, markdown and results.
    • You don’t have those clonky code cells.
    • You don’t spend tons of time googling and trying out how to use nbformatter.

I have not used Panel or Voila. I will try them out soon. But Streamlit is not only for Dashboards and Data Science. Streamlit is actually a very, very efficient way of building high performant, reactive apps in Python using a very simple, intuitive and declarative api.

You can see an example of a multifile app structured into models, services, components/ widgets etc. here
https://github.com/marcskovmadsen/awesome-streamlit. The main app.py file is here https://github.com/MarcSkovMadsen/awesome-streamlit/blob/master/app.py. It produces https://awesome-streamlit.org

2 Likes
  • Just to be clear, I didn’t mean to claim that streamlit is restricted to single files; all Python dashboarding tools can use code structured into arbitrarily many files via imports. I’m just looking for a concise way to refer to the fundamental distinction between streamlit and notebooks, i.e. whether for a given source file, there are multiple distinctly separate code cells with their own associated output (the notebook approach) or whether that source file leads to a single, coherent output (the streamlit approach). That shouldn’t be a controversial idea; it’s just trying to summarize how streamlit differs from notebooks.

  • I also completely agree that editing code in a real editor is much, much more natural and efficient than editing in a notebook cell; that’s why my code cells get pasted back and forth in Emacs all the time, and why we developed NEI (which is still alpha code, and only for Emacs) to avoid that. Web browsers are lousy editors! I also agree with most of Joel Grus’s other complaints about notebooks and how people abuse them. Personally, I hate notebooks in the same way I hate the destructive power of water, planes, and automobiles, which are all also terrible, dangerous, and often deadly things I’d never want to live without!

  • I can even accept a claim that iterating in streamlit can more quickly lead to a finished, polished product; I don’t know if it’s true in general, but it could be!

So the point I am trying to make here is simply that notebooks do offer something tangible, real, and important that streamlit does not. It’s up to each person to decide whether what notebooks offer is important to them, and if so, whether what streamlit offers is important enough to justify the friction and pain that people will have if they switch between notebooks and streamlit frequently.

Simply put, notebooks (inherently and by design) offer the ability to capture and communicate to a human that this bit of code (not the whole file or collection of modules) produces this output, with human-readable text attached that can concisely and precisely explain what is going on in that one bit of code. Notebooks are thus designed to capture and convey a code-based narrative, a story, which has a linear flow and is composed of small, human-digestible steps that relate text, code, and output.

A dashboard, whether made by Panel or streamlit or voila or shiny, does not do that. A dashboard offers a coherent, integrated set of functionality not overtly tied to bits of code, where the user approaches it as a unified app that lets them do certain things. Notebooks tell stories, while dashboards provide functionality. A dashboard can be abused to tell a story, but it’s awkward – the linear flow is either lost during development or forced on the finished product, and it takes a lot of effort to convey how bits of the dashboard relate to bits of code. A notebook can be abused to be a dashboard, but it’s awkward – the code and the linear flow have to be hidden, and it takes a lot of effort to make a notebook cell appear to be a unified app.

So the question people need to face is one of what is important to them. Is it important to capture a process of understanding data or models incrementally, preserving each bit and explaining it to others or yourself for posterity? If so, notebooks should be important to you. Or are you ok with losing all the individual steps that lead up to a particular dashboard-like artifact, “efficiently and productively” forgetting how you got that code? If so, streamlit is a great approach. From what I can see, if all you care about is the final artifact, streamlit will be useful. If you only care about capturing a step-by-step process or telling a story, then a notebook is the only way to go.

I think most people are in between those extremes. For my own work and for that of our clients, we believe that both activities are important, and that it’s important to lower the friction between them. At any moment, I can be exploring some data freely, capturing a series of reproducible steps for my own later understanding and use, telling a story to others, creating a dashboard for a particular audience, and going back and forth between all those things. That’s precisely what Panel is designed to encourage, i.e. to make it simple and lossless to switch between those activities and contexts, making there be very little cost to make a dashboard out of a notebook, make a notebook out of a dashboard, explore interactively, write a blog post, document a thought process for posterity, and so on. (See examples.pyviz.org for lots of examples of each of these activities.) Streamlit is what you get if you want to optimize (but not capture or explain) the process of getting to a final artifact. Panel is what you get if you think all those activities are important. It’s very nice for users to have these alternatives now!

3 Likes

One jupyter notebook workflow that I definitely wouldn’t replace with streamlit is performing exploratory data analysis. IMO, Jupyter notebook still provides the best environment for ad-hoc exploratory data analysis. I also use jupytext notebook extension which enables me to edit synced py version of my notebooks with my favorite editor or IDE. Otherwise, streamlit is great for making a finished product in the form of a dashboard.

1 Like

Thanks @jbednar

I really like the discussion you have opened here :+1: And I learn a lot from it. Thanks.

I can see that I’m also coming from another background :-).

  • I really seldom wan’t to show people intermediate code steps. I can see that some of my colleagues does. But those code cells with 20 lines of pandas or matplotlib are not really that usefull to me. That’s just my opinion. I would rather give a link to the source code and then show selected parts via st.echo() if neeed.
  • I’ve just started experiencing with Streamlit but I actually very much like the run all file hot reload feature with caching for data exploration and data engineering. With notebooks I get confused all the time on which cells have run in which order and the rerun all takes for ever. But that’s maybe just me.
  • And with notebooks I have all these hazzles all the time of managing kernels, jupyter extensions etc. where i spend a lot of time on Google and Stack overflow.
  • And I wan’t to deploy my things. That’s a part of my *definition of done’.

On another note I’ve started a comparision to Voila that you might be interested in. See How does Streamlit compare to Voila?

1 Like

Just an update.

I’ve also been looking into Panel at awesome-panel.org. It’s pretty powerful and can do a lot of advanced stuff.

Especially the reactive api is pretty nice to build more advanced components and apps on.

And I can now see that it is actually for some use cases a super power to be able to either develop in or deploy to a notebook.

1 Like