Web Scraping

Frameworks and tools for crawling websites, headless browsing, and data extraction.

Repositories

puppeteer / puppeteer

A Node.js library providing a high-level API to control Chrome and Firefox via DevTools Protocol or WebDriver BiDi. Supports headless mode by default, ideal for web scraping, automated testing, screenshot capture, PDF generation, and browser automation workflows.

TypeScript

95.2k

2 hours ago

unclecode / crawl4ai

Open-source web crawler optimized for LLMs, converting web content into clean Markdown for AI applications. Features async processing, browser automation, and structured data extraction.

Python

69.3k

4 days ago

D4Vinci / Scrapling

Scrapling is an adaptive Python web scraping framework that handles everything from single requests to full-scale crawls. Its smart parser automatically relocates elements after website changes, built-in fetchers bypass anti-bot systems like Cloudflare, and the spider framework supports concurrent crawling with pause/resume, proxy rotation, and AI integration via MCP server.

Python

65.5k

5 days ago

scrapy / scrapy

Scrapy is a powerful Python framework for web crawling and scraping, providing a complete toolkit for extracting structured data from websites efficiently and at scale.

Python

62.4k

11 hours ago