Repository navigation

#

text-mining

📖 A curated list of resources dedicated to Natural Language Processing (NLP)

17428
2 年前

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python
4593
12 天前

extract text from any document. no muss. no fuss.

HTML
4260
9 个月前

Library to scrape and clean web pages to create massive datasets.

Python
2194
5 年前

a curated list of R tutorials for Data Science, NLP and Machine Learning

R
2061
2 年前

Python package for Korean natural language processing.

Python
1467
2 年前

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

TeX
1355
4 个月前

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

C++
1196
4 年前

Text mining using tidy tools ✨📄✨

R
1190
25 天前

从新浪财经、每经网、金融界、中国证券网、证券时报网上,爬取上市公司(个股)的历史新闻文本数据进行文本分析、提取特征集,然后利用SVM、随机森林等分类器进行训练,最后对实施抓取的新闻数据进行分类预测

Python
1185
8 个月前
kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Jupyter Notebook
1175
5 年前

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Python
1072
3 年前

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Shell
1068
4 个月前

A collection of notebooks for Natural Language Processing from NLP Town

Jupyter Notebook
1007
1 年前

A configurable web spider with a easy-to-use web console

Java
998
7 年前

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

R
867
1 年前