NLP & Text Processing

Natural Language Processing libraries and toolkits.

Repositories

hankcs/HanLP

HanLP: A multilingual NLP toolkit for industrial applications, providing Chinese word segmentation, POS tagging, named entity recognition, dependency parsing, and more with deep learning and statistical models.

Python
36.2k
4 months ago

Jieba is a powerful Chinese text segmentation library for Python, supporting multiple segmentation modes, part-of-speech tagging, keyword extraction, and custom dictionaries. Ideal for NLP, search engines, and text analysis applications.

Python
34.8k
2 years ago

spaCy is an advanced NLP library for Python and Cython, featuring state-of-the-art speed and neural network models for tasks like tokenization, named entity recognition, text classification, and dependency parsing across 70+ languages. It supports transformers like BERT, offers a production-ready training system, and enables easy model deployment.

Python
33.4k
16 hours ago

FastText is an efficient library for learning word representations and text classification, developed by Facebook Research. It supports subword information handling, multi-language models, and provides pre-trained vectors for 157 languages, making it ideal for NLP tasks like sentiment analysis and language identification.

HTML
26.5k
2 years ago

A comprehensive repository tracking state-of-the-art progress across 50+ NLP tasks in multiple languages, featuring benchmark datasets, performance metrics, and research papers for machine learning practitioners.

Python
23.0k
2 years ago