Make apps faster by moving heavy computation to a separate process

Hi all!

I just made a GitHub Gist today showing how to use separate processes to do heavy computations in your Streamlit app, so I thought I might as well explain it here. If you just want the TLDR, you can check out the Gist here or just scroll below.

Why this is useful

Streamlit reruns your script every time you interact with your app. This is great because it lets you just write code without having to first think about how to architect your app. For example, with Streamlit you don’t start your app by designing your models, views, controllers, callbacks, routes, etc. You just code.

But if you’re doing heavy computations that hog your CPU core for seconds at a time (think hundreds of millions of random multiplications) this will block concurrent users from using that CPU core. As a result, your app will be slower for those concurrent users.

What can you do then?

Answer

Just make sure your main script is fast! :wink:

If you’re doing something that takes time to initialize or compute, use Streamlit’s caching primitives as a first resort. That means your old friends, st.cache_data and st.cache_resource.

But if you need to perform heavy computations that vary with each run, each user, etc, and which are unlikely to ever produce the same result, then you should run them on a separate process! This way you can either leave it to your OS’s scheduler to send that process to a CPU core that can serve it quickly, or you could manually send it to a different machine altogether.


How to run computations on a different process


There are different ways to do this.

Method #1: Use a separate HTTP server

You can write a Python server using your favorite backend framework (FastAPI, etc) and make HTTP calls to that server from your Streamlit app. Then your app would look like:

import streamlit as st
import requests

"""
# Hello world!

Starting a long computation on another process...
"""

SERVER_URL = 'https://localhost:1234/make_heavy_computation'
requests.post(SERVER_URL, data={'param1': 'value1', 'param2': 'value2'})
result = r.json()

st.write("The result of the heavy computation is", result["value"])

Very clean!

And the nice thing is that this server can live on any machine. You just need to deploy it there.

But writing and deploying a separate server is pretty annoying! What if I don’t want to deal with routes, serialization, etc.? Then check out below…


Method #2: Use a ProcessPoolExecutor

Python comes with a very nifty library that can pull whatever code you give it into a separate process. It handles serialization for you, and the instantiation just looks like a function call:

import streamlit as st
import concurrent.futures
import mymodule

# Your st calls must go inside this IF block.
if __name__ == '__main__':
    st.write("Starting a long computation on another process...")

    # Pick max number of concurrent processes. Depends on how heavy your computation is, and how
    # powerful your machine is.
    MAX_WORKERS = 50

    @st.cache_resource
    def get_executor():
        return concurrent.futures.ProcessPoolExecutor(max_workers=MAX_WORKERS)

    future = get_executor().submit(
      # Function to run on a separate process.
      mymodule.some_heavy_computation,

      # Arguments to pass to the function above.
      a=100, b=200, c=300)

    # Wait for result.
    result = future.result()

    st.write("The result of the heavy computation is", result)

Method #3: Use a library like Ray

Ray lets you do the same as ProcessPoolExecutor, but the processes could run on the current machine or on a separate cluster of machines (a Ray cluster). This makes it very scalable!

Here’s how that looks:

import streamlit as st
import ray
import mymodule

# Your st calls must go inside this IF block.
if __name__ == '__main__':
    st.write("Starting a long computation on another process...")

    @st.cache_resource
    def initialize_ray():
        # You can configure this init call to point to a separate machine too.
        ray.init()

    initialize_ray()

    # Call the heavy function in a subprocess.
    future = mymodule.some_heavy_computation.remote(a=100, b=200, c=300)

    # Wait for result.
    result = ray.get(future)

    st.write("The result of the heavy computation is", result)

Then in mymodule.py you do this:

import ray

@ray.remote(num_cpus=50)
def some_heavy_computation(a, b, c):
    # Do something crazy here
    return a * b * c

Note that for both ProcessPoolExecutor and Ray you can also define the mymodule.some_heavy_computation() function directly inside your main Streamlit script. But if you do that, remember to put it outside the if __name__ == '__main__' block.


Wrapping up

And that’s it! Surprisingly enough, methods 2 and 3 take less than 10 Python statements to implement, and are super flexible. With them, you’re just limited by Python’s own performance. You’ll get the best performance any Python framework can achieve.

Let us know how this works for you, or if you have any improvements to the code/ideas above.

Also: do you think it would be worth implementing some Streamlit primitives to make Methods 1, 2, or 3 even simpler? Any thoughts on what that API would look like? Our default stance is to leave these kinds of things to other Python libraries – unless we can do something magically better than them. What would that look like?

17 Likes

Thanks for this nice overview!

I’m wondering how all this would work in the Streamlit Community Cloud? For example, is Method #2 working there at all, and if yes, does it bring any benefits? :thinking:

In my experiance over the past several years, the easies way is to have a github actions run the heavy computation in the backend of the repo.

Dosent matter if you need to call a couple of endpoint or running complex machine learning models.

Just putting in the streamlit script the data you are going to vizualize that is the result of the backend script.

1 Like

Yes, method 2 should work on Cloud!

Yes, this is similar to method #1, and it’s a great solution too.

Do any of these options work in a WASM-based compute environment like PyCafe (based on PyIodide I think)?

Thank you, @thiago, for addressing this topic and sharing these different approaches.

Yes, it would be awesome to implement Streamlit options to support offloading heavy computation.

For example, my common use case is to display tabular data retrieved from an API. It can take several seconds for this data to load and that delays rendering of Streamlit components.

To resolve this issue, I created a custom datatable component based on the Datatables.net JavaScript library (https://datatables.net/). This library supports loading data from Ajax sources.

You can see this component in action here:

Configuring the Ajax data source for this component is trivially simple:

options["ajax"] = {
        "url": url,
        "dataSrc": "products",
        "type": "GET"
    }

The result is the “heavy computation” of generating the data is offloaded to another server, while the Streamlit components load immediately.

Besides datatables, other components that load lots of data could benefit from using remote sources. For example, consider how the React Select select input supports loading options from remote sources (React-Select). This is useful when there might be hundreds of options which may take a couple seconds to retrieve (e.g., lists of companies, assets, etc.).

I would love to see Streamlit components like st.datatable and st.selectbox with options to use remote data sources.

Just FYI, I managed to run it nicely in the Streamlit Cloud using Method #3 (ray), but I had to add a ray.shutdown() after the result = ray.get(future). Without it, it runs successfully, but about a minute later the whole streamlit app would crash with an error: “Failed to connect to GCS within 60 seconds. GCS may have been killed. It’s either GCS is terminated by ray stop or is killed unexpectedly.”

I assume it somehow it’s terminated in a bad way, but doing it manually after the calculation works fine for me.