Repository navigation

#

robots-txt

Polite, slim and concurrent web crawler.

Go
2051
4 年前

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

Go
791
4 年前

🤖 The largest directory for AI-ready documentation and tools implementing the proposed llms.txt standard

TypeScript
502
2 个月前

Tame the robots crawling and indexing your Nuxt site.

TypeScript
489
10 小时前

The robots.txt exclusion protocol implementation for Go language

Go
275
3 年前

A simple but powerful web crawler library for .NET

C#
253
2 年前

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

PHP
247
8 天前

A set of reusable Java components that implement functionality common to any web crawler

Java
245
16 小时前

Opt-Out tool to check Copyright reservations in a way that even machines can understand.

Python
194
2 年前

Open-Source Python Based SEO Web Crawler

Python
180
2 年前

NodeJS robots.txt parser with support for wildcard (*) matching.

JavaScript
157
10 个月前

Known tags and settings suggested to opt out of having your content used for AI training.

HTML
154
1 年前

Makes it easy to add robots.txt, sitemap and web app manifest during build to your Astro app.

TypeScript
124
2 年前

grobotstxt is a native Go port of Google's robots.txt parser and matcher library.

Go
111
3 年前

Gatsby plugin that automatically creates robots.txt for your site

JavaScript
106
2 年前

🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs

Python
92
6 天前

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

87
6 个月前