Repository navigation

#

article-extractor

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python
4147
1 个月前
extractus/article-extractor

To extract main article from given URL with Node.js

JavaScript
1690
2 个月前

Readability / Html Content / Article Extractor & Web Scrapping library written in PHP

PHP
460
2 年前

SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla

C#
166
3 个月前

An article extractor in Rust

Rust
134
3 年前

Reddit bot to preview and post hyperlinks as comments

Python
102
2 年前

The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.

TypeScript
64
15 天前

Extract article or news by url or html, parse the title and content, output in markdown format.

Python
49
8 个月前

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

33
2 年前

This is a small and easy-to-use desktop application that allows exporting Web of Science API Expanded and InCites API data in Excel/CSV/JSON/XML with a configurable and flexible data export structure.

Vue
32
1 个月前

Involution King Fun Book (IKFB, Chinese: 快卷, 卷王快乐本) is an integrated management system for papers and literature. Powered by Electron.

Vue
32
3 年前

【 Spring Boot 实战开发】10 分钟快速构建一个自己的技术文章博客

Kotlin
31
7 年前

📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа

Python
19
2 年前

디시인사이드 Client-Side 글 검색기 입니다.

Python
18
1 年前

The program can be used to scrape the content from an article from web by an input of a set of URLs in a text file or a URL. This project uses newspaper3k and python-docx libraries. The output of this program will give a neatly modified Word Document in '.docx' format with the contents of the article.

Python
16
5 年前

从html中提取正文,用于新闻类网页

Go
16
2 年前