Repository navigation

#

word-segmentation

Unsupervised text tokenizer for Neural Network-based text generation.

C++
11181
1 天前
C++
3960
4 年前

Unsupervised text tokenizer focused on computational efficiency

C++
972
1 年前

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Python
670
3 个月前

Kiwi(지능형 한국어 형태소 분석기)

C++
611
3 天前
Python
402
4 个月前

中文文本分类、序列标注工具包(pytorch),支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Chinese text classification and sequence labeling toolkit, supports multi class and multi label classification, text similsrity, text summary and NER.

Python
349
1 年前

A PyTorch implementation of the BI-LSTM-CRF model.

Python
256
10 个月前

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Python
247
6 个月前

轻量级高性能中文分词项目

C++
200
2 年前