Repository navigation

#

text-mining

📖 A curated list of resources dedicated to Natural Language Processing (NLP)

17101
1 年前

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python
4147
1 个月前

extract text from any document. no muss. no fuss.

HTML
4073
5 个月前

Library to scrape and clean web pages to create massive datasets.

Python
2184
4 年前

a curated list of R tutorials for Data Science, NLP and Machine Learning

R
2037
2 年前

Python package for Korean natural language processing.

Python
1442
2 年前

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

TeX
1340
13 天前

Text mining using tidy tools ✨📄✨

R
1186
1 年前

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

C++
1184
3 年前
kavgan/nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Jupyter Notebook
1169
4 年前

从新浪财经、每经网、金融界、中国证券网、证券时报网上,爬取上市公司(个股)的历史新闻文本数据进行文本分析、提取特征集,然后利用SVM、随机森林等分类器进行训练,最后对实施抓取的新闻数据进行分类预测

Python
1116
4 个月前

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Python
1067
2 年前

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Shell
1019
2 年前

A configurable web spider with a easy-to-use web console

Java
994
7 年前

A collection of notebooks for Natural Language Processing from NLP Town

Jupyter Notebook
988
9 个月前

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

R
862
8 个月前