Repository navigation

#

ai-scraping

D4Vinci/Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Python
7418
2 小时前

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

TypeScript
2327
5 天前

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

Python
1860
2 个月前
Python
1224
1 个月前

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

Jupyter Notebook
561
4 个月前

Crawl a website starting from a URL, find relevant pages, and extract data – all guided by your natural language prompt.

148
11 天前

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

JavaScript
85
1 个月前

[Mirror] Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.

Go
81
1 个月前

AI web scraper built with Crawl4AI for extracting structured leads data from websites.

Python
49
8 个月前

Python, Javascript, and Rust libraries for the Spider Cloud API.

Python
19
1 个月前

All Scrapers Resource Available Here! Give Us Stars🌟

TypeScript
15
21 天前

Fastest and cheapest distributed residential proxy network.

TypeScript
10
21 天前

Extract Google Maps business leads and enrich contact details using AI & web scraping

Python
5
3 个月前

ScrapeGraphAI is a Python-based web-scraping framework that pairs large-language-model reasoning with a graph-style pipeline engine to turn websites (or local XML/HTML/JSON/Markdown files) into structured data with just a handful of lines of code.

Python
4
4 个月前

AI Scraper : scrap and extract data from website in any format (CSV, JSON, HTML...) using Selenium or Crawl4ai, and using Ollama or Sambanova API, and using Streamlit for UI as chatbot

Python
4
4 个月前

A CLI tool and REST API that converts web content to clean Markdown, bypassing anti-scraping measures using headless browsers. Perfect for AI/LLM applications

Go
4
8 个月前