Repository navigation

article-extractor

Website
Wikipedia

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

web-scraping text-extraction 自然语言处理 text-mining 爬虫 text-preprocessing article-extractor readability scraping html-to-markdown corpus-tools rss-feed news-aggregator rag 大语言模型

Python

4763

318

23 天前

extractus / article-extractor

To extract main article from given URL with Node.js

Node.js article-parser readability article article-extractor 爬虫 extract scraper

JavaScript

1748

152

1 个月前

scotteh / php-goose

Readability / Html Content / Article Extractor & Web Scrapping library written in PHP

article article-extractor PHP readability scraper Composer

PHP

462

118

2 年前

Strumenta / SmartReader

SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla

readability article-extractor C#

173

22 天前

hipstermojo / paperoni

An article extractor in Rust

Rust readability article-extractor

Rust

133

4 年前

artiomn / markdown_articles_tool

Parse markdown article, download images and replace images URL's with local paths

Markdown markdown-converter Image markdown-parser downloader markdown-to-html markdown-to-pdf HTML pdf article article-extractor articles image-manipulation python-library toolset

Python

125

4 个月前

fterh / sneakpeek

Reddit bot to preview and post hyperlinks as comments

Reddit article-extractor preview

Python

102

3 年前

web64 / nlpserver

NLP Web Service

自然语言处理 API language-detection entity-extraction article-extractor sentiment-analysis

Python

3 年前

inaridiy / webforai

The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.

article-extractor extractor readability scraping text-mining html-to-markdown

TypeScript

6 个月前

web64 / laravel-nlp

Laravel wrapper for common NLP tasks

laravel-package 自然语言处理 language-detection article-extractor entity-extraction sentiment-analysis

PHP

5 年前

myifeng / article-parser

Extract article or news by url or html, parse the title and content, output in markdown format.

article-parser news Python beautifulsoup article article-extractor extract extractor

Python

1 年前

lightfeed / extractor

Using LLMs and AI browser automation to robustly extract web data

ai-agents article-extractor 爬虫 data-engineering data-pipeline etl html-parser html-to-markdown 大语言模型自然语言处理 rag rss-feed web-data-extraction webscraping Markdown google-gemini openai

TypeScript

5 天前

clarivate / wos-excel-converter

This is a small and easy-to-use desktop application that allows exporting Web of Science API Expanded and InCites API data in Excel/CSV/JSON/XML with a configurable and flexible data export structure.

article-extractor converter excel CSV csv-export

Vue

7 个月前

johnbumgarner / newshound

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

article-extractor 数据科学 datascience data-extraction text-mining news news-aggregator Python web-scraping webscraping data-mining

3 年前

Creator-SN / IKFB

Involution King Fun Book (IKFB, Chinese: 快卷, 卷王快乐本) is an integrated management system for papers and literature. Powered by Electron.

article-extractor notebook electron-vue Fluent Design System pdf-viewer

Vue

3 年前

KotlinSpringBoot / saber

【 Spring Boot 实战开发】10 分钟快速构建一个自己的技术文章博客

spider Kotlin Spring Boot article-extractor blog

Kotlin

7 年前

woojubb / html-article-extractor

A web page content extractor

article-extractor extractor extraction 爬虫 crawling

JavaScript

1 年前

pgh268400 / Dcinside_Explorer_Python

디시인사이드 Client-Side 글 검색기 입니다.

Python article-extractor

Python

2 年前

lord-alfred / dnlp

📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа

fasttext nltk language-detection language-recognition article-extractor readability text-processing 自然语言处理 nlp-parsing

Python

3 年前

0xAmmar / Medium-Miner

a medium scraper that you need.

article-extractor Bug Bounty bugbounty-tool bugbounty-tools bugbounty-writeups bugbountytricks Hacking hacking-tool Medium python-script Reconnaissance reconnaissance scraper

Python

2 年前