Web Scraping
Frameworks and tools for crawling websites, headless browsing, and data extraction.
Repositories
scrapy / scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Python
59.6k
gocolly / colly
Elegant Scraper and Crawler Framework for Golang
Go
25.0k
puppeteer / puppeteer
JavaScript API for Chrome and Firefox
TypeScript
93.5k
cheeriojs / cheerio
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
TypeScript
30.1k
unclecode / crawl4ai
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Python
59.4k
NanmiCoder / MediaCrawler
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
Python
43.6k
Y2Z / monolith
⬛️ CLI tool and library for saving complete web pages as a single HTML file
Rust
14.7k