Repository navigation

ai-scraping

Website
Wikipedia

The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥

人工智能爬虫 Markdown scraper html-to-markdown 大语言模型 scraping web-crawler ai-scraping webscraping web-scraping web-data web-data-extraction ai-agents data-extraction ai-crawler ai-search web-scraper web-search

TypeScript

61356

4965

3 小时前

ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI

scraping scraping-python automated-scraper 大语言模型人工智能 web-crawler web-scraping ai-scraping 爬虫 html-to-markdown Markdown rag

Python

21409

1832

12 小时前

D4Vinci / Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫 crawling crawling-python Playwright Python scraping selectors stealth-game web-scraper web-scraping web-scraping-python webscraping xpath 自动化人工智能 ai-scraping data data-extraction mcp mcp-server

Python

7418

417

2 小时前

any4ai / AnyCrawl

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

aitools crawl scrape webscraper ai-scraping data html-to-markdown rag scraping

TypeScript

2327

229

5 天前

itsOwen / CyberScraper-2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

ai-scraping 大语言模型 openai scraper webscraping gemini-api web-scraper

Python

1860

167

2 个月前

raznem / parsera

Lightweight library for scraping web-sites with LLMs

data-extraction 大语言模型 scraping Python Open Source webscraping 人工智能 ai-scraping Playwright

Python

1224

1 个月前

firecrawl / firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

人工智能 ai-scraping data Example html-to-markdown 大语言模型 Markdown rag web-crawler templates

Jupyter Notebook

561

178

4 个月前

oxylabs / oxylabs-ai-studio-py

Oxylabs AI Studio python SDK

ai-crawler ai-scraping ai-search ai-tools web-scraping web-scraping-python

Python

213

6 天前

oxylabs / ai-crawler-py

Crawl a website starting from a URL, find relevant pages, and extract data – all guided by your natural language prompt.

人工智能 ai-agents ai-crawler web-crawler ai-scraping

148

11 天前

ArchiveBox / abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

Chrome crawling cURL downloader headless Playwright Puppeteer scraping wget youtube-dl yt-dlp cli-tool 命令行界面 http-client ai-scraping

JavaScript

1 个月前

WeebDataHoarder / go-away

[Mirror] Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.

ai-scraping http-proxy 安全 mirror

1 个月前

kaymen99 / ai-web-scraper

AI web scraper built with Crawl4AI for extracting structured leads data from websites.

ai-agents ai-scraping crawl4ai 大语言模型 scraper web-scraper web-scraping

Python

8 个月前

spider-rs / web-crawling-guides

How to guides on web-crawling or scraping

agents ai-agents ai-scraping 爬虫 html-to-markdown scraper web-scraping

5 个月前

spider-rs / spider-clients

Python, Javascript, and Rust libraries for the Spider Cloud API.

人工智能 ai-agents ai-scraping 爬虫 html-to-markdown scraper spider web-scraping Supabase

Python

1 个月前

Chakszzz / NB-Scraper

All Scrapers Resource Available Here! Give Us Stars🌟

ai-scraping facebook-scraper scraper Open Source youtube-downloader ytdl

TypeScript

21 天前

L1shed / Turbo

Fastest and cheapest distributed residential proxy network.

ai-scraping web-scraping payment-gateway iaas collaborate

TypeScript

21 天前

kaymen99 / google-maps-lead-generator

Extract Google Maps business leads and enrich contact details using AI & web scraping

ai-agents ai-scraping Google 地图 google-maps-api web-scraping

Python

3 个月前

GitRectify / scrapegraph-ai

ScrapeGraphAI is a Python-based web-scraping framework that pairs large-language-model reasoning with a graph-style pipeline engine to turn websites (or local XML/HTML/JSON/Markdown files) into structured data with just a handful of lines of code.

人工智能 ai-scraping automated-scraper 爬虫 html-to-markdown 大语言模型 Markdown rag scraping scraping-python web-crawler web-scraping

Python

4 个月前

drisskhattabi6 / AI-Scraper

AI Scraper : scrap and extract data from website in any format (CSV, JSON, HTML...) using Selenium or Crawl4ai, and using Ollama or Sambanova API, and using Streamlit for UI as chatbot

ai-scraping crawl4ai 爬虫 crawling ollama ollama-api openrouter openrouter-api scraper scraping Selenium selenium-python Streamlit streamlit-webapp

Python

4 个月前

nathabonfim59 / md-fetch

A CLI tool and REST API that converts web content to clean Markdown, bypassing anti-scraping measures using headless browsers. Perfect for AI/LLM applications

ai-scraping Go scraper

8 个月前