Frustrated: Streamlit frontend disconnects from apps with long-running processes

I have a streamlit app (built in docker, deployed to heroku) that formulates an optimization (a linear program) and then sends the schedule to a gurobi server. However, the process takes a while, and thus for a 20-year optimization at the hourly level, I chunk it into years (8760 data points for a year, x20 years).

Basically, once the inputs are set by the user and saved as state variables a la Joel Grus’ Game State hacks , the process is:

  1. Formulate inputs and constraints to optimization for 2021 →
  2. send to gurobi →
  3. get back optimal solution from gurobi as a dataframe->
  4. display on Streamlit app “Optimal Solution for 2021 Complete! Moving on to 2022…”
  5. Repeat for next iteration, for 20 iterations.
  6. Concatenante all 20 dataframes from the process and display some summary results and some download and “Send to database” buttons for the results.

Each iteration takes about a minute. However, I can only make it to about the 10th iteration (sometimes less, sometimes more) of that before the streamlit frontend “disconnects” from the running loop process of the app and resets all of my state variables, etc, like I had just reloaded the page. The background app process still continues (it’s still looping through the iterations - it doesn’t know streamlit got disconnected), but there is no way to reconnect to the running process again. Without streamlit, this process is fine to run locally with scripts, but with streamlit the frontend is so unreliable as to be unusable for this app. Nothing more frustrating than to get nearly done with a process that took 20 minutes and then it resets itself, forcing the user to start the process over.

I’m really not sure what to try at this point, other than build something much more robust like a flask app that sends these jobs to a job container, which then sends back the result when done. Running the optimization on the same computer as the frontend seems like a fool’s errand.

Anyone have tips to keep a streamlit frontend connected to a long-running process without resetting?

Hi @Kladar

That sounds super frustrating. Sorry about that :slightly_frowning_face:

In my experience those symptoms sound a lot like there’s some unexpected HTTP timeout going on on the host side. The only thing that gives me pause here is that the default HTTP timeout is ~30s, so usually that’s how long your app would take between disconnect-reconnect cycles. But in your case it sounds like these reconnect-disconnect cycles are much longer…

Either way, to help debug it would be great to get a couple of things:

  1. If you look at your server logs, do you see any errors?
  2. On the browser-side, if you open your browser’s dev tools, do you see anything either on the JS console or in the Network tab?

If you don’t know what to look for, or just want a second pair of eyes, feel free to post the logs from (1) and a HAR file from (2) here so I can help debug.

2 Likes

Thanks for the response, sorry about the late reply. I’ve recreated the error and made a screen recording, and I did so running the app locally to avoid the confounding variable of the heroku-based deployment, which I played up too much in the initial post. It actually is more likely to fail running locally, though I’m not sure why. For a 20 year run (~10 minutes), it resets the app about 60% of the time somewhere along in the process (I had to record a couple times to show a failure, as the first couple examples made it through the analysis successfully). Link to recording

I have a HAR file for a later failure (not the one in the video) but the forum won’t let me attach it. Should I convert it to some other file type? We don’t need the Heroku logs because since I ran it locally. And now that I think about it, it could be a memory thing - since python is pretty bad at memory management and Chrome is a RAM hog, perhaps running it several times successfully caused it to fail? Though I CTRL+C stopped the streamlit app and started it again each time and my memory usage on my computer seemed pretty static. I’m at a loss. ::

@thiago

Hey @Kladar , thanks for the video!

After watching the video I tried reproducing the issue locally using a toy example, but I’m just not able to :confused:

This is the toy example I tried:

import streamlit as st
import time
import datetime

"""
# Long-running app example
"""

start_time = datetime.datetime.now()

for i in range(20):
  "---"
  "Loop number:", i
  "Ellapsed time:", datetime.datetime.now() - start_time

  # Tried with
  # time.sleep(10)
  # ...but can't repro

  # Tried with
  # time.sleep(60)
  # ...but can't repro

At least this rules out a few hypotheses I had about websocket keepalive failures…

To help debug:

  1. Can you put the HAR file in this private Google Drive folder I shared with you?
  2. Can you repro the bug with streamlit run --logger.level=debug script_name.py 2> bug.log and upload the log from bug.log to Google Drive?
  3. Can you try using a Chrome profile that has no Chrome extensions installed?
1 Like