Introducing the Crawlit Project: A Web Crawler with Streamlit

andell · August 15, 2023, 7:09am

Hello Streamlit community!

I’m excited to introduce you to my project, Crawlit. It’s a web crawler built on Scrapy, enhanced with a Streamlit user interface to visualize and analyze the results.

Key Features:

Web Crawler: Uses Scrapy (and Crowl.tech) to navigate and gather data from specified websites.
Streamlit Interface: Provides interactive visualization and analysis of the collected data, including a distribution of PageRanks.
CSV Export: Offers the capability to export the gathered data in CSV format for further processing.
PageRank: In this project, we employ a method to compute the PageRank of various pages, drawing inspiration from the original algorithm and incorporating the concept of the reasonable surfer.
Visualization with ECharts: We utilize ECharts, an open-source visualization library, to showcase the distribution of PageRanks of our crawled web pages, the distribution of response statuses, links by depth, among other insights.

How to Use?

Clone the repository: git clone https://github.com/drogbadvc/crawlit.git
Navigate to the project directory and install dependencies: pip install -r requirements.txt
Run the project with: streamlit run graph-streamlit.py
Open your browser at: http://localhost:8501 after launching Streamlit.

For more details, please refer to the documentation on GitHub.

I’d greatly appreciate your feedback and suggestions to enhance this project. Thank you for your attention, and happy exploring!

Charly_Wargnier · August 15, 2023, 8:12am

This is fantastic, @andell!

Is the app deployed anywhere so users can try it?

Best,
Charly

andell · August 15, 2023, 8:28am

I haven’t deployed this online. in any case it’s no longer online but I can redeploy it on my server for a demo.

and thank you !

I was in too much of a hurry to reply and forgot to say thank you.

Charly_Wargnier · August 15, 2023, 8:31am

That would be great!

Let me know when your app is deployed, and we’ll be more than happy to help promote it across our social media channels!

Best,
Charly

andell · August 15, 2023, 4:26pm

Thank you very much

I’ve deployed a demo of the app here: Streamlit

you can launch a crawl and see the different results.
However, don’t crawl amazon, facebook or millions of pages, the server won’t hold.

Enjoy !

FYI: I plan to improve and add options to the crawler in the future.

Charly_Wargnier · August 16, 2023, 10:54am

Thanks @andell, looks great!

@Jessica_Smith one for us to promote

Charly

andell · August 17, 2023, 7:57am

Thank you very much Charly,

it’s very kind of you. Don’t hesitate to watch the project grow.

Kene · August 21, 2023, 5:07pm

Nice Idea. I will love to check this out

system · February 17, 2024, 5:07pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Web Scraper App for Automated Testing Show the Community! pandas , streamlit-cloud	4	124	May 5, 2025
Streamlit app for twitter word search scraping Show the Community!	3	599	October 21, 2022
The Streamlit Roadmap: Big Plans for 2020! Show the Community!	1	1240	January 12, 2022
Interactive Drill-Down with Streamlit and Plotly Show the Community! build-with-streamlit	0	430	March 12, 2025
GitHub Stats Analysis Show the Community! streamlit-cloud , build-with-streamlit , connectchallenge	4	302	February 27, 2025

Introducing the Crawlit Project: A Web Crawler with Streamlit

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies