Web Scraping

Frameworks and tools for crawling websites, headless browsing, and data extraction.

Repositories

Puppeteer is a JavaScript library providing a high-level API to control Chrome or Firefox via DevTools Protocol or WebDriver BiDi. It runs headless by default and is widely used for web scraping, testing, and automation.

TypeScript
93.9k
2 days ago

Open-source web crawler optimized for LLMs, converting web content into clean Markdown for AI applications. Features async processing, browser automation, and structured data extraction.

Python
62.3k
3 days ago
scrapy/scrapy

Scrapy is a powerful Python framework for web crawling and scraping, providing a complete toolkit for extracting structured data from websites efficiently and at scale.

Python
60.9k
3 days ago

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

Python
46.3k
3 days ago
D4Vinci/Scrapling

Scrapling is an adaptive web scraping framework that handles everything from single requests to full-scale crawls. Its parser automatically relocates elements when websites update, while built-in fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box.

Python
31.7k
2 days ago

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

TypeScript
30.2k
13 hours ago

Elegant Scraper and Crawler Framework for Golang

Go
25.2k
a month ago

⬛️ CLI tool and library for saving complete web pages as a single HTML file

Rust
14.9k
a month ago