Introducing the Crawlit Project: A Web Crawler with Streamlit

Hello Streamlit community!

I’m excited to introduce you to my project, Crawlit. It’s a web crawler built on Scrapy, enhanced with a Streamlit user interface to visualize and analyze the results.

Key Features:

  • Web Crawler: Uses Scrapy (and Crowl.tech) to navigate and gather data from specified websites.
  • Streamlit Interface: Provides interactive visualization and analysis of the collected data, including a distribution of PageRanks.
  • CSV Export: Offers the capability to export the gathered data in CSV format for further processing.
  • PageRank: In this project, we employ a method to compute the PageRank of various pages, drawing inspiration from the original algorithm and incorporating the concept of the reasonable surfer.
  • Visualization with ECharts: We utilize ECharts, an open-source visualization library, to showcase the distribution of PageRanks of our crawled web pages, the distribution of response statuses, links by depth, among other insights.

How to Use?

  1. Clone the repository: git clone https://github.com/drogbadvc/crawlit.git
  2. Navigate to the project directory and install dependencies: pip install -r requirements.txt
  3. Run the project with: streamlit run graph-streamlit.py
  4. Open your browser at: http://localhost:8501 after launching Streamlit.

For more details, please refer to the documentation on GitHub.

I’d greatly appreciate your feedback and suggestions to enhance this project. Thank you for your attention, and happy exploring!

5 Likes

This is fantastic, @andell!

Is the app deployed anywhere so users can try it?

Best,
Charly

I haven’t deployed this online. in any case it’s no longer online but I can redeploy it on my server for a demo.

and thank you !

I was in too much of a hurry to reply and forgot to say thank you. :sweat_smile:

1 Like

That would be great! :hugs:

Let me know when your app is deployed, and we’ll be more than happy to help promote it across our social media channels!

Best,
Charly

1 Like

Thank you very much :hugs:

I’ve deployed a demo of the app here: Streamlit

you can launch a crawl and see the different results.
However, don’t crawl amazon, facebook or millions of pages, the server won’t hold. :laughing:

Enjoy !

FYI: I plan to improve and add options to the crawler in the future.

2 Likes

Thanks @andell, looks great!

@Jessica_Smith one for us to promote :slight_smile:

Charly

1 Like

Thank you very much Charly,

it’s very kind of you. Don’t hesitate to watch the project grow. :hugs:

1 Like

Nice Idea. I will love to check this out

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.