Troubleshooting performance issues with multiple concurrent users

Hello all -

First off, thank you to this amazing community. I’ve been using streamlit for an internal dashboard for about 9 months and have been blown away at the ease of use of streamlit as well as the amazing community in the forums. I’ve been able to solve numerous problems because of the posts on this community.

I write to get support in figuring out a problem that I’ve seen a number of posts on in the past: performance with concurrent users.

First, some technical details:

  • I’m running streamlit on a DigitalOcean Droplet. I can’t share a link since it is an internal only website
  • The droplet’s specs: 8 GB Memory / 4 AMD vCPUs / 25 GB Disk
  • I’m using streamlit 1.39
  • I’m running python 3.11, using docker. This is my docker image base: FROM python:3.11-slim

Second, the issue:
Like many others before me, I seem to have written an app that works great with 1-2 people using it, but when 10-15 people use it, it becomes semi-unusable: sidebar page clicks take 10-30 seconds to acknowledge, pages can take 30+ seconds to load, etc… Normally all of this happens in under a second (or maybe a few seconds the first time a page is loaded, before caching takes place).

What seems odd to me is that my server doesn’t seem to be under severe load. I am including screenshots from last Friday - I gave a presentation at about 1:45p where lots of people suddenly logged in and streamlit became non-functional. You’ll see that there is a clear spike in everything: disk usage, CPU, memory, etc… but everything seems to be within the bounds of reason.

Do folks have suggestions as to how I can improve performance?

  • Will upping the number of cores on my droplet help?
  • Is there something I can do better with caching? I use st.cache_data for any data that requires some processing based on user requests.
  • Is it helpful that i use cache_data on my functions that read data from parquet into a dataframe? I have a number of largish datasets that are in parquet files and I thought it would improve performance to read from them once and then have everything in memory.
  • What can I do to profile the slowness when there are multiple concurrent users? I a) don’t know how to simulate this and b) wouldn’t quite know where to look for bottlenecks.

Thank you for any support you can give!