Scalability concerns with large user base

I have created a chatbot application that is intended to serve multiple users and I am concerned about the scalability. I was able to successfully launch the application but the concern is the growing user base even with horizontal scaling enabled on the kubernetes cluster. I can increase the number of replicas only to a certain extent due to cost restrictions and not sure if there are any ideas to run it in a multithreaded fashion. If anyone has any experience scaling streamlit apps, please let me know. Thanks in advance

1 Like

Streamlit already runs in a multitreaded fashion.

1 Like

ok I thought it runs single-theaded fashion because it uses Tornado under the hood which per my understanding is a single-threaded web server. Can you please share some docs around it? Also is there a way to increase the number of workers? Not sure if I am missing something

1 Like

There are no docs, it is stated in a post in the forum.

Yes, the web server is single-threaded, but I don’t think that will be your bottleneck.

2 Likes

I read that post. So it sounds like each thread is a web server on its own for each user and I should still be able to run multiple users concurrently as long as I have enough resources to spin up threads for each user. Right? Also with each thread being separate I am also assuming that there wouldn’t be any memory overlap and the resource contingency would be addressed similar to gunicorn processes, right?

1 Like

I think there’s only one web server.

Yes, but that is for your application code, not the web server.

I don’t understand this. Thread share memory and, in python, they can certainly share python objects.

I don’t know gunicorn, but processes and threads are very different beasts.

1 Like

I think there’s only one web server.

I probably did not word it correctly. I meant each thread is a separate execution flow and can run concurrently for each user

1 Like

I am really concerned about the memory overlap because this is a chatbot application and there could be multiple users having the conversation at any point of time I am concerned that the context in the memory for each user may overlap. Not sure if I should switch to a different solution

1 Like

I think that’s really more of a general python question than a streamlit question, but I don’t think you need to be concerned about different users ending up with each other’s context.

Here’s a post that may be relevant Make apps faster by moving heavy computation to a separate process

1 Like

Of course you can keep separate contexts too.

1 Like

The memory question relates to session states. Each “user” is just each time the web address for your streamlit program is accessed. Each user gets their own “session,” inside of which are “session state variables.”

Session state variables are what you use instead of dynamic global variables. If you were to change a global variable in your python code, it would change across all users the moment one user triggers it.

Instead, you can save that memory to session state variables, access them across all your modules, and not worry about other users’ memory being affected.

Create a session state variable like this:

st.session_state.apple_string = "apple"

You can then access this variable and change it anytime.

st.session_state.apple_string += " and banana"
1 Like

First of all, congratulations on having an application with a continuously growing user base. This is a remarkable achievement for an application developed using Streamlit. Although I cannot fully understand your current situation, I can try to provide some suggestions for your reference:

  1. It is advisable to avoid excessively large Session States. If appropriate external storage is available, you may consider switching to external storage.
  2. Separating the static files of Streamlit and related components to a CDN may be helpful. In the complex scenarios I have encountered, this approach has indeed been quite beneficial. (However, this requires a certain amount of engineering effort. You can refer to: New Component: Streamlit-CDN allows you to deploy multiple instances in K8S and host all Streamlit dependencies on your own CDN)
  3. If there are other heavy logic processes on the page, you may consider splitting the logic into separate pages. (New Component: Streamlit-router allows you to create truly production-level multi-page applications)
  4. You can use the method introduced here to analyze the entire page, which may help identify potential system bottlenecks. (New Component: streamlit-embeded allows easy embedding of an HTML snippet and automatically adjusts the content height)

These suggestions may be beneficial in general situations, but they may not be as effective for applications like chatbots. “Always Async” may be a more effective approach, such as always using AsyncOpenAI Client instead of synchronous methods like requests.

However, these are not the most crucial points. Most importantly, if you already have a stable user base, it may be worth considering rebuilding the application using other frameworks such as Vercel’s AI SDK with React. After all, in more complex application scenarios, Streamlit’s advantages will rapidly diminish. (This is also the reason why I transitioned from extensively using Streamlit to Dash, and then further to Next.js.)

3 Likes

I have to agree with @mapix_i . I would recommend rebuilding in a python framework designed for supporting more scalable enterprise apps. Others to consider would be Flask or Django.

I admit I haven’t gone done this path myself having only built PoCs and Streamlit apps for a small user base however I’ve always said to my team Streamlits sweet spot is small user base analytic apps and PoCs.

I just feel there are just too many constraints, at the moment (I hope this changes with the evolution of Streamlit), when trying to build a multi-user multi threaded app with authentication, session management and complex user interface requirements.

If anyone can prove me wrong on the above points happy to listen :wink: