Make apps faster by moving heavy computation to a separate process

Hi all!

I just made a GitHub Gist today showing how to use separate processes to do heavy computations in your Streamlit app, so I thought I might as well explain it here. If you just want the TLDR, you can check out the Gist here or just scroll below.

Why this is useful

Streamlit reruns your script every time you interact with your app. This is great because it lets you just write code without having to first think about how to architect your app. For example, with Streamlit you don’t start your app by designing your models, views, controllers, callbacks, routes, etc. You just code.

But if you’re doing heavy computations that hog your CPU core for seconds at a time (think hundreds of millions of random multiplications) this will block concurrent users from using that CPU core. As a result, your app will be slower for those concurrent users.

What can you do then?


Just make sure your main script is fast! :wink:

If you’re doing something that takes time to initialize or compute, use Streamlit’s caching primitives as a first resort. That means your old friends, st.cache_data and st.cache_resource.

But if you need to perform heavy computations that vary with each run, each user, etc, and which are unlikely to ever produce the same result, then you should run them on a separate process! This way you can either leave it to your OS’s scheduler to send that process to a CPU core that can serve it quickly, or you could manually send it to a different machine altogether.

How to run computations on a different process

There are different ways to do this.

Method #1: Use a separate HTTP server

You can write a Python server using your favorite backend framework (FastAPI, etc) and make HTTP calls to that server from your Streamlit app. Then your app would look like:

import streamlit as st
import requests

# Hello world!

Starting a long computation on another process...

SERVER_URL = 'https://localhost:1234/make_heavy_computation', data={'param1': 'value1', 'param2': 'value2'})
result = r.json()

st.write("The result of the heavy computation is", result["value"])

Very clean!

And the nice thing is that this server can live on any machine. You just need to deploy it there.

But writing and deploying a separate server is pretty annoying! What if I don’t want to deal with routes, serialization, etc.? Then check out below…

Method #2: Use a ProcessPoolExecutor

Python comes with a very nifty library that can pull whatever code you give it into a separate process. It handles serialization for you, and the instantiation just looks like a function call:

import streamlit as st
import concurrent.futures
import mymodule

# Your st calls must go inside this IF block.
if __name__ == '__main__':
    st.write("Starting a long computation on another process...")

    # Pick max number of concurrent processes. Depends on how heavy your computation is, and how
    # powerful your machine is.
    MAX_WORKERS = 50

    def get_executor():
        return concurrent.futures.ProcessPoolExecutor(max_workers=MAX_WORKERS)

    future = get_executor().submit(
      # Function to run on a separate process.

      # Arguments to pass to the function above.
      a=100, b=200, c=300)

    # Wait for result.
    result = future.result()

    st.write("The result of the heavy computation is", result)

Method #3: Use a library like Ray

Ray lets you do the same as ProcessPoolExecutor, but the processes could run on the current machine or on a separate cluster of machines (a Ray cluster). This makes it very scalable!

Here’s how that looks:

import streamlit as st
import ray
import mymodule

# Your st calls must go inside this IF block.
if __name__ == '__main__':
    st.write("Starting a long computation on another process...")

    def initialize_ray():
        # You can configure this init call to point to a separate machine too.


    # Call the heavy function in a subprocess.
    future = mymodule.some_heavy_computation.remote(a=100, b=200, c=300)

    # Wait for result.
    result = ray.get(future)

    st.write("The result of the heavy computation is", result)

Then in you do this:

import ray

def some_heavy_computation(a, b, c):
    # Do something crazy here
    return a * b * c

Note that for both ProcessPoolExecutor and Ray you can also define the mymodule.some_heavy_computation() function directly inside your main Streamlit script. But if you do that, remember to put it outside the if __name__ == '__main__' block.

Wrapping up

And that’s it! Surprisingly enough, methods 2 and 3 take less than 10 Python statements to implement, and are super flexible. With them, you’re just limited by Python’s own performance. You’ll get the best performance any Python framework can achieve.

Let us know how this works for you, or if you have any improvements to the code/ideas above.

Also: do you think it would be worth implementing some Streamlit primitives to make Methods 1, 2, or 3 even simpler? Any thoughts on what that API would look like? Our default stance is to leave these kinds of things to other Python libraries – unless we can do something magically better than them. What would that look like?


Thanks for this nice overview!

I’m wondering how all this would work in the Streamlit Community Cloud? For example, is Method #2 working there at all, and if yes, does it bring any benefits? :thinking:

In my experiance over the past several years, the easies way is to have a github actions run the heavy computation in the backend of the repo.

Dosent matter if you need to call a couple of endpoint or running complex machine learning models.

Just putting in the streamlit script the data you are going to vizualize that is the result of the backend script.

1 Like