Repository navigation

crawling

Website
Wikipedia

scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python scraping crawling 框架爬虫 Hacktoberfest web-scraping web-scraping-python

Python

58425

11075

1 天前

gocolly / colly

Elegant Scraper and Crawler Framework for Golang

Go scraper 框架爬虫 scraping crawling spider

24706

1833

4 天前

apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

web-scraping web-crawling npm headless-chrome Puppeteer 自动化 apify scraping crawling 爬虫 headless scraper web-crawler JavaScript Node.js Playwright TypeScript

TypeScript

19685

1018

10 小时前

codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Python news 爬虫 crawling scraper news-aggregator

HTML

14800

2130

2 个月前

D4Vinci / Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫 crawling crawling-python Playwright Python scraping selectors stealth-game web-scraper web-scraping web-scraping-python webscraping xpath 自动化人工智能 ai-scraping data data-extraction mcp mcp-server

Python

7418

417

3 天前

lorien / awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

web-scraping captcha-recaptcha crawling crawling-python scraping scraping-framework scraping-python scraping-tool webscraping 爬虫 spider

Makefile

7357

825

9 个月前

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify 自动化 beautifulsoup 爬虫 crawling headless headless-chrome pip Playwright Python scraper scraping web-crawler web-crawling web-scraping Hacktoberfest

Python

6757

482

1 天前