Streamlit Crawler Timeout Issue: Stops Crawling After 30-45 Minutes

Sumamah · January 6, 2025, 10:55am

I’m building a web crawler using Python with Streamlit. The crawler is designed to scrape data from a website over an extended period, but I’m facing an issue where Streamlit stops the crawling process after 30-45 minutes, and the app reloads back to the main page, effectively preventing the crawler from completing its task.
Here’s an overview of my setup:

I’m using requests and BeautifulSoup for the crawling part.
The data is being scraped over a long duration, and Streamlit is running the process interactively, without updates shown in real-time.

The crawler works fine for a while, but after about 30 to 45 minutes, it stops and redirects me back to the main page, interrupting the crawling process.

Questions:

What might be causing Streamlit to reload the page and stop the crawling process?
Are there any specific time limits, session timeouts, or memory issues that might be causing this behavior?
How can I handle long-running tasks in Streamlit without it timing out or reloading?
What are some best practices for running a long-duration web crawler in Streamlit?

If anyone has experienced this behavior or knows how to optimize the crawler for extended runs in Streamlit, I’d really appreciate your help!
sumamahahmad701@gmail.com

shawngiese · January 9, 2025, 10:34pm

Any debug from the logs? If the web session times out I think the crawl job might be affected (just a guess).

Topic		Replies	Views
Does streamlit has a maximum run time for functions? Using Streamlit	11	6865	November 7, 2024
Is there a runtime limit other than memory limit on streamlit cloud apps? Community Cloud	8	1281	July 31, 2024
App stops running after a few minutes Using Streamlit	2	461	February 7, 2022
Session timeout Using Streamlit session-state	2	1796	April 27, 2024
Streamlit app deployed in streamlit cloud is still fetching the data from firestore even when the app has been closed Using Streamlit	3	313	February 20, 2024

Streamlit Crawler Timeout Issue: Stops Crawling After 30-45 Minutes

Questions:

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies