NLP & Text Processing

Natural Language Processing libraries and toolkits.

Repositories

hankcs / HanLP

HanLP: A multilingual NLP toolkit for industrial applications, providing Chinese word segmentation, POS tagging, named entity recognition, dependency parsing, and more with deep learning and statistical models.

Python

36.4k

7 months ago

fxsjy / jieba

Jieba is a powerful Chinese text segmentation library for Python, supporting multiple segmentation modes, part-of-speech tagging, keyword extraction, and custom dictionaries. Ideal for NLP, search engines, and text analysis applications.

Python

35.0k

2 years ago

explosion / spaCy

spaCy is an advanced NLP library for Python and Cython, featuring state-of-the-art speed and neural network models for tasks like tokenization, named entity recognition, text classification, and dependency parsing across 70+ languages. It supports transformers like BERT, offers a production-ready training system, and enables easy model deployment.

Python

33.7k

a month ago

facebookresearch / fastText

FastText is an efficient library for learning word representations and text classification, developed by Facebook Research. It supports subword information handling, multi-language models, and provides pre-trained vectors for 157 languages, making it ideal for NLP tasks like sentiment analysis and language identification.

HTML

26.5k

2 years ago

sebastianruder / NLP-progress

A comprehensive repository tracking state-of-the-art progress across 50+ NLP tasks in multiple languages, featuring benchmark datasets, performance metrics, and research papers for machine learning practitioners.

Python

23.0k

2 years ago

Collections

NLP & Text Processing

Repositories

hankcs / HanLP

fxsjy / jieba

explosion / spaCy

facebookresearch / fastText

sebastianruder / NLP-progress

Graph