Repository navigation

#

web-scraping

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python
57990
1 天前
Mintplex-Labs/anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

JavaScript
48095
4 小时前
dgtlmoon/changedetection.io

Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monitoring—all for free or enjoy our SaaS plan!

Python
26240
8 小时前
apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

TypeScript
18781
26 分钟前
Evil0ctal/Douyin_TikTok_Download_API

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

Python
13939
5 个月前
alirezamika/autoscraper
Python
6900
2 个月前
D4Vinci/Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Python
6453
20 分钟前

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Python
6181
10 小时前

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

Python
5085
10 小时前

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python
4593
11 天前
JavaScript
4220
11 小时前

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

Python
4052
13 小时前
snooppr/snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Python
3431
1 天前