Repository navigation

html-extractor

Website
Wikipedia

Module for automatic summarization of text documents and HTML pages.

Python lsa textteaser html-page summarizer pagerank-algorithm reduction text-extraction html-extraction html-extractor summarization summary 自然语言处理

Python

3609

534

8 天前

bookieio / breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

Python text-mining text-extraction html-extraction html-extractor html-parsing

HTML

204

1 年前

cdimascio / essence

Automatically extract the main text content (and more) from an HTML document

html-extractor extractor scraper Hacktoberfest

Kotlin

117

3 年前

zezhix / html-extractor

基于行块分布函数的通用网页正文抽取算法优化，Python实现

html-extractor Python

Python

6 年前

kwaziidev / textractor

从html中提取正文,用于新闻类网页

article-extractor extraction html-extractor extractor Go

2 年前

JanDC / css-from-html-extractor

PHP library which determines which css is used from html snippets.

CSS php-library html-extractor

PHP

6 年前

Whomrx666 / Xtract-html

Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.

HTML html-extraction html-extractor kali-linux Linux Termux termux-tool

Python

6 个月前

Whomrx666 / Xtract-htmlV2

Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version

extract html-extraction html-extractor kali-linux Linux Termux termux-tool

Python

6 个月前

davidmillerpak / Media-Graper

Media Graper is a open source tool for Linux which is developed to extract all the Images, links, Videos from a Webpage.

scrapper Website hacking-tools html-extractor linux-tools web-hacking

Shell

2 年前

the-real-yey / Simple-HTML-Extractor-

A simple extractor based on BeatufulSoup, You can use it to iterate through all the HTML files in the website root directory and get the text, placeholders and other text.

extractor beautifulsoup html-extractor

Python

6 年前

MorrisGlr / HEART

HTML‐to‐Anki Enhanced Human Explanation & Reasoning Tool (HEART). A Python CLI that leverages the OpenAI API to transform full UWorld vignettes into AI-enhanced Anki flashcards.

active-learning anki-flashcards 教学 html-extractor learning-resources openai-api HTML Python

Python

3 个月前