Repository navigation

#

apify

apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

TypeScript
17484
1 天前

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Python
5533
2 天前

Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.

TypeScript
137
10 小时前
TypeScript
136
10 小时前

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.

Python
130
2 天前

House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.

JavaScript
121
2 年前

This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to websites using advanced protection.

Python
82
2 年前

Amazon crawler - this configuration will extract items for a keywords that you will specify in the input, and it will automatically extract all pages for the given keyword. You can specify more keywords on the input for one run.

JavaScript
76
4 年前

Apify API client for Python

Python
60
3 天前

Scrape Tripadvisor restaurant, hotels, and places.

JavaScript
50
3 年前

基于Apify+node+react搭建的有点意思的爬虫平台

JavaScript
41
5 年前

Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.

TypeScript
30
1 年前

Apify actor to scrape Youtube search results. You can set the maximum videos to scrape per page as well as the date from which to start scraping.

JavaScript
25
3 年前

Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!

TypeScript
25
5 个月前

You can use this act to monitor any page's content and get a notification when content changes.

JavaScript
20
3 年前

Web application for recording, management and editing of inteligent RPA workflows using Playwright technology

TypeScript
17
3 年前

No more dealing with Google API. Simple Node.js program to automate access to Google Sheets.

JavaScript
16
3 年前

Apify actor for scraping events from Ticketmaster based on their categories

JavaScript
15
2 年前

Automate monitoring prices on the most popular solution for building online stores and selling products online. Crawl arbitrary Shopify-powered online stores and extract a list of all products in a structured form, including product title, price, description, etc

JavaScript
13
2 年前