Repository navigation

scraper

Website
Wikipedia

The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥

人工智能爬虫 Markdown scraper html-to-markdown 大语言模型 scraping web-crawler ai-scraping webscraping web-scraping web-data web-data-extraction ai-agents data-extraction ai-crawler ai-search web-scraper web-search

TypeScript

61337

4962

4 小时前

huginn / huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

自动化 notifications scraper webscraping feedgenerator RSS agent 监控 feed twitter-streaming huginn X (Twitter)

Ruby

47566

4116

12 小时前

NaiboWang / EasySpider

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。

code-free 爬虫 GUI layman spider parameters Web input-parameters 前端 HTML batch-processing batch-script visual visualization visualprogramming scraper data-collection rpa Robotics

JavaScript

42744

5244

1 个月前

iawia002 / lux

👾 Fast and simple video download library and CLI tool written in Go

downloader Go 爬虫 scraper Video bilibili YouTube youku iqiyi tumblr qq download

30486

3208

20 天前

cheeriojs / cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

cheerio jQuery htmlparser2 Document Object Model (DOM)htmlparser selector scraper Parser HTML Hacktoberfest

TypeScript

29802

1675

2 天前

feder-cr / Jobs_Applier_AI_Agent_AIHawk

AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.

自动化 Bot ChatGPT gpt job jobsearch jobseeker opeai Python resume scraper scraping application-resume Selenium Chrome human-resources jobs agent 人工智能

Python

28875

4382

4 个月前

gocolly / colly

Elegant Scraper and Crawler Framework for Golang

Go scraper 框架爬虫 scraping crawling spider

24706

1833

4 天前

apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

web-scraping web-crawling npm headless-chrome Puppeteer 自动化 apify scraping crawling 爬虫 headless scraper web-crawler JavaScript Node.js Playwright TypeScript

TypeScript

19685

1018

10 小时前

codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Python news 爬虫 crawling scraper news-aggregator

HTML

14800

2130

2 个月前

Evil0ctal / Douyin_TikTok_Download_API

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。

Python

14464

2134

3 天前

getmaxun / maxun

⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡

自动化无代码 scraper web-automation web-scraper web-scraping API browser browser-automation Playwright 自托管 robotic-process-automation rpa no-code-web-scraper agents data-extraction webscraping Hacktoberfest hacktoberfest-accepted

TypeScript

13668

1108

1 天前

pwxcoo / chinese-xinhua

📙 中华新华字典数据库。包括歇后语，成语，词语，汉字。

data scraper chinese-traditional Python chinese chinese-characters chinese-nlp chinese-language chinese-simplified json-dataset JSON json-data

Python

11352

2635

2 年前

guyueyingmu / avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

javbus avmoo javlibrary spider 爬虫 Laravel scraper adult magnet-link magnet 数据库 adult-video guzzlehttp

PHP

9821

2027

1 年前

TeamWiseFlow / wiseflow

Use LLMs to dig out what you care about from massive amounts of information and a variety of sources daily.

爬虫 information-gathering 大语言模型 scraper

Python

7818

1381

3 天前

alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

scraping scraper scrape webscraping 爬虫 web-scraping 人工智能 Python webautomation 自动化机器学习

Python

6984

711

4 个月前

BruceDone / awesome-crawler

A collection of awesome web crawler,spider in different languages

web-crawler 爬虫 web-scraper spider scraper Awesome Lists

6958

732

1 年前

arc298 / instagram-scraper

Scrapes an instagram user's photos and videos

Instagram instagram-scraper instagram-user-photos Python scraper instagram-client instagram-api

Python

6955

1376

3 年前

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify 自动化 beautifulsoup 爬虫 crawling headless headless-chrome pip Playwright Python scraper scraping web-crawler web-crawling web-scraping Hacktoberfest

Python

6757

482

1 天前