Understanding scaling and hardware usage

BramVanroy · May 11, 2023, 9:42pm

Hello

I was reading this thread: Does streamlit is running on a single-threaded development server by default or not? and found that for every user a separate thread is spun up on the CPU. I have a few questions regarding this:

I’ve always heard that CPythons threading is very poorly usable because of the GIL and that in reality this means that spawned threads cannot truly work in parallel. Is that the case in streamlit or is GIL bypasses somehow?
Let’s say that I am using large language models in my app. I use st.cache_resource for them. Let’s say the app uses a model that takes up 1GB of memory. What is the impact of spawning different threads for different users? Is the model copied for every user, so for ten users we are now using 10GB of RAM. Or are they all referencing the same model (and thus computation is slow and not distributed at all)?
Bonus question: same as above but with GPUs. If a model is running on a GPU, is it copied N times for N users?

I’d love to know and also whether for the last part we control this one way or another. Either how often resources get parallellized or how we can scale streamlit+large ML models on a local server.

Thanks

Goyo · May 12, 2023, 5:01pm

CPython threads cannot take advantage of multiple cores but processes can. I don’t think Streamlit uses processes (I might be wrong) but you can use them in your app.
That’s an easy one:

Cached objects are shared across all users, sessions, and reruns. They must be thread-safe because they can be accessed from multiple threads concurrently.

But I don’t see how that relates to slowness. The code running the model might spawn processes to distribute the computation or even release the GIL and spawn threads that run simultaneously in several cores (some libraries like numpy can do that). So it depends.

See above, Streamlit by itself won’t make copies of the cached object. Your code (or library code that your code is calling) may or may not make such copies, but that is orthogonal to using Streamlit.

BramVanroy · May 13, 2023, 5:05pm

Thanks for the reply!

With respect to your answer to two: if streamlit indeed spawns new threads for new sessions, and users share the cached objects, then it is impossible for parallellism to take place. First because of GIL and second because they are all sharing the same ML model to run input through. So the different user sessions will all make use of just the single, cached and shared, instance of the ML model and hence is “slow”. Am I mistaken?

Goyo · May 13, 2023, 9:02pm

Parallelism is istil possible in the two ways I mentioned: spawning new processes and calling library code that releases the GIL. Any Python application can do it and Streamlit applications are not an exception.

So the different user sessions will all make use of just the single, cached and shared, instance of the ML model

Applications have access to that shared instance and can do anything with it, including making copies, if that is what worries you.

system · November 9, 2023, 9:03pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to sharing my Streamlit app with large-scale ML models to multiple users Community Cloud cache , streamlit-cloud	3	2252	February 3, 2023
Force streamlit to use all the cpus available Using Streamlit	2	2471	January 12, 2022
App runs slow when multiple users accessing simultaneously Using Streamlit	5	4726	November 28, 2024
Multiple Users Using Streamlit	2	15411	December 24, 2023
Streamlit Concurrency Question Using Streamlit cache	2	2914	November 19, 2021

Understanding scaling and hardware usage

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies