Web Scraping

Frameworks and tools for crawling websites, headless browsing, and data extraction.

Repositories

Puppeteer is a JavaScript library providing a high-level API to control Chrome or Firefox via DevTools Protocol or WebDriver BiDi. It runs headless by default and is widely used for web scraping, testing, and automation.

TypeScript
94.3k
18 hours ago

Open-source web crawler optimized for LLMs, converting web content into clean Markdown for AI applications. Features async processing, browser automation, and structured data extraction.

Python
64.7k
14 days ago
scrapy/scrapy

Scrapy is a powerful Python framework for web crawling and scraping, providing a complete toolkit for extracting structured data from websites efficiently and at scale.

Python
61.6k
2 days ago

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

Python
49.0k
8 days ago
D4Vinci/Scrapling

Scrapling is an adaptive Python web scraping framework that handles everything from single requests to full-scale crawls. Its smart parser automatically relocates elements after website changes, built-in fetchers bypass anti-bot systems like Cloudflare, and the spider framework supports concurrent crawling with pause/resume, proxy rotation, and AI integration via MCP server.

Python
47.3k
2 days ago

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

TypeScript
30.3k
2 hours ago

Elegant Scraper and Crawler Framework for Golang

Go
25.3k
14 days ago

⬛️ CLI tool and library for saving complete web pages as a single HTML file

Rust
15.1k
7 days ago