Repository navigation

#

chinese-word-segmentation

100+ Chinese Word Vectors 上百种预训练中文词向量

Python
11983
1 年前

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Python
6602
2 年前
C++
3927
4 年前

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

Python
3383
3 年前

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

Java
922
2 年前

The Jieba Chinese Word Segmentation Implemented in Rust

Rust
808
2 个月前

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

C
498
1 年前

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Python
246
2 个月前

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

Python
240
6 年前

A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .

Python
208
3 年前

一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..

Python
150
6 个月前

手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。

116
5 年前

Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018

Python
90
7 年前

Source code for an ACL2017 paper on Chinese word segmentation

Python
89
6 年前