Repository navigation

#

word-segmentation

Unsupervised text tokenizer for Neural Network-based text generation.

C++
10800
18 天前
C++
3927
4 年前

Unsupervised text tokenizer focused on computational efficiency

C++
965
1 年前

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Python
668
1 年前

Kiwi(지능형 한국어 형태소 분석기)

C++
575
12 天前
Python
397
10 个月前

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee

Python
343
9 个月前

A PyTorch implementation of the BI-LSTM-CRF model.

Python
250
6 个月前

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Python
246
2 个月前

轻量级高性能中文分词项目

C++
199
2 年前