Repository navigation

#

html-extractor

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

HTML
204
1 年前

Automatically extract the main text content (and more) from an HTML document

Kotlin
117
3 年前

基于行块分布函数的通用网页正文抽取算法优化,Python实现

Python
60
5 年前

从html中提取正文,用于新闻类网页

Go
16
2 年前

PHP library which determines which css is used from html snippets.

PHP
9
5 年前

Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.

Python
5
2 个月前

Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version

Python
4
2 个月前

Media Graper is a open source tool for Linux which is developed to extract all the Images, links, Videos from a Webpage.

Shell
1
2 年前

A simple extractor based on BeatufulSoup, You can use it to iterate through all the HTML files in the website root directory and get the text, placeholders and other text.

Python
0
5 年前