Natural Language Processing libraries and toolkits.
NLP & Text Processing
Repositories
HanLP: A multilingual NLP toolkit for industrial applications, providing Chinese word segmentation, POS tagging, named entity recognition, dependency parsing, and more with deep learning and statistical models.
Jieba is a powerful Chinese text segmentation library for Python, supporting multiple segmentation modes, part-of-speech tagging, keyword extraction, and custom dictionaries. Ideal for NLP, search engines, and text analysis applications.
spaCy is an advanced NLP library for Python and Cython, featuring state-of-the-art speed and neural network models for tasks like tokenization, named entity recognition, text classification, and dependency parsing across 70+ languages. It supports transformers like BERT, offers a production-ready training system, and enables easy model deployment.
FastText is an efficient library for learning word representations and text classification, developed by Facebook Research. It supports subword information handling, multi-language models, and provides pre-trained vectors for 157 languages, making it ideal for NLP tasks like sentiment analysis and language identification.
A comprehensive repository tracking state-of-the-art progress across 50+ NLP tasks in multiple languages, featuring benchmark datasets, performance metrics, and research papers for machine learning practitioners.