Streamlit Crawler Timeout Issue: Stops Crawling After 30-45 Minutes

I’m building a web crawler using Python with Streamlit. The crawler is designed to scrape data from a website over an extended period, but I’m facing an issue where Streamlit stops the crawling process after 30-45 minutes, and the app reloads back to the main page, effectively preventing the crawler from completing its task.
Here’s an overview of my setup:

  • I’m using requests and BeautifulSoup for the crawling part.
  • The data is being scraped over a long duration, and Streamlit is running the process interactively, without updates shown in real-time.

The crawler works fine for a while, but after about 30 to 45 minutes, it stops and redirects me back to the main page, interrupting the crawling process.

Questions:

  1. What might be causing Streamlit to reload the page and stop the crawling process?
  2. Are there any specific time limits, session timeouts, or memory issues that might be causing this behavior?
  3. How can I handle long-running tasks in Streamlit without it timing out or reloading?
  4. What are some best practices for running a long-duration web crawler in Streamlit?

If anyone has experienced this behavior or knows how to optimize the crawler for extended runs in Streamlit, I’d really appreciate your help!
sumamahahmad701@gmail.com

Any debug from the logs? If the web session times out I think the crawl job might be affected (just a guess).