I’m building a web crawler using Python with Streamlit. The crawler is designed to scrape data from a website over an extended period, but I’m facing an issue where Streamlit stops the crawling process after 30-45 minutes, and the app reloads back to the main page, effectively preventing the crawler from completing its task.
Here’s an overview of my setup:
- I’m using requests and BeautifulSoup for the crawling part.
- The data is being scraped over a long duration, and Streamlit is running the process interactively, without updates shown in real-time.
The crawler works fine for a while, but after about 30 to 45 minutes, it stops and redirects me back to the main page, interrupting the crawling process.
Questions:
- What might be causing Streamlit to reload the page and stop the crawling process?
- Are there any specific time limits, session timeouts, or memory issues that might be causing this behavior?
- How can I handle long-running tasks in Streamlit without it timing out or reloading?
- What are some best practices for running a long-duration web crawler in Streamlit?
If anyone has experienced this behavior or knows how to optimize the crawler for extended runs in Streamlit, I’d really appreciate your help!
sumamahahmad701@gmail.com