NLP & Text Processing

Natural Language Processing libraries and toolkits.

Repositories

hankcs/HanLP

HanLP: A multilingual NLP toolkit for industrial applications, providing Chinese word segmentation, POS tagging, named entity recognition, dependency parsing, and more with deep learning and statistical models.

Python
36.3k
6 months ago

Jieba is a powerful Chinese text segmentation library for Python, supporting multiple segmentation modes, part-of-speech tagging, keyword extraction, and custom dictionaries. Ideal for NLP, search engines, and text analysis applications.

Python
34.9k
2 years ago

spaCy is an advanced NLP library for Python and Cython, featuring state-of-the-art speed and neural network models for tasks like tokenization, named entity recognition, text classification, and dependency parsing across 70+ languages. It supports transformers like BERT, offers a production-ready training system, and enables easy model deployment.

Python
33.5k
a month ago

FastText is an efficient library for learning word representations and text classification, developed by Facebook Research. It supports subword information handling, multi-language models, and provides pre-trained vectors for 157 languages, making it ideal for NLP tasks like sentiment analysis and language identification.

HTML
26.5k
2 years ago

A comprehensive repository tracking state-of-the-art progress across 50+ NLP tasks in multiple languages, featuring benchmark datasets, performance metrics, and research papers for machine learning practitioners.

Python
23.0k
2 years ago