Repository navigation

爬虫

Website
Wikipedia: 维基百科

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

firecrawl / firecrawl

The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥

人工智能爬虫 Markdown scraper html-to-markdown 大语言模型 scraping web-crawler ai-scraping webscraping web-scraping web-data web-data-extraction ai-agents data-extraction ai-crawler ai-search web-scraper web-search

TypeScript

61333

4962

3 小时前

scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python scraping crawling 框架爬虫 Hacktoberfest web-scraping web-scraping-python

Python

58425

11075

1 天前

NaiboWang / EasySpider

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。

code-free 爬虫 GUI layman spider parameters Web input-parameters 前端 HTML batch-processing batch-script visual visualization visualprogramming scraper data-collection rpa Robotics

JavaScript

42744

5244

1 个月前

iawia002 / lux

👾 Fast and simple video download library and CLI tool written in Go

downloader Go 爬虫 scraper Video bilibili YouTube youku iqiyi tumblr qq download

30486

3208

20 天前

gocolly / colly

Elegant Scraper and Crawler Framework for Golang

Go scraper 框架爬虫 scraping crawling spider

24706

1833

4 天前

jhao104 / proxy_pool

Python ProxyPool for web spider

爬虫 proxy spider Redis HTTP

Python

22839

5335

8 个月前

ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI

scraping scraping-python automated-scraper 大语言模型人工智能 web-crawler web-scraping ai-scraping 爬虫 html-to-markdown Markdown rag

Python

21409

1832

6 小时前

apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

web-scraping web-crawling npm headless-chrome Puppeteer 自动化 apify scraping crawling 爬虫 headless scraper web-crawler JavaScript Node.js Playwright TypeScript

TypeScript

19685

1018

10 小时前

binux / pyspider

A Powerful Spider(Web Crawler) System in Python.

Python 爬虫

Python

16898

3681

1 年前

codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Python news 爬虫 crawling scraper news-aggregator

HTML

14800

2130

2 个月前

Evil0ctal / Douyin_TikTok_Download_API

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。

Python

14464

2134

3 天前

shengqiangzhang / examples-of-web-crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

爬虫 spider taobao tmall Example Python Selenium pyquery stock fund multithreading WeChat

HTML

14418

3838

3 个月前

projectdiscovery / katana

A next-generation crawling and spidering framework.

爬虫 web-spider gocrawler spider-framework 命令行界面 headless Hacktoberfest

14261

791

3 天前

s0md3v / Photon

Incredibly fast crawler designed for OSINT.

爬虫 spider Python OSINT information-gathering

Python

12252

1636

6 个月前

crawlab-team / crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

webcrawler scrapy crawlab spiders-management Go scrapyd-ui spider 爬虫 webspider web-crawler Docker platform crawling-tasks

11990

1871

6 天前

code4craft / webmagic

A scalable web crawler framework for Java.

爬虫 Java scraping 框架

Java

11643

4164

1 个月前

ssssssss-team / spider-flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

spider 爬虫 jsoup xpath web-spider webspider webcrawler web-crawler spider-flow

Java

10969

2126

2 年前

injetlee / Python

Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

Python 爬虫 WeChat excel

Python

10309

4270

2 年前

guyueyingmu / avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

javbus avmoo javlibrary spider 爬虫 Laravel scraper adult magnet-link magnet 数据库 adult-video guzzlehttp

PHP

9821

2027

1 年前

TeamWiseFlow / wiseflow

Use LLMs to dig out what you care about from massive amounts of information and a variety of sources daily.

爬虫 information-gathering 大语言模型 scraper

Python

7819

1382

3 天前