Repository navigation

#

chinese-language

Python
11303
2 年前

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3940
8 天前

Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。

1059
1 年前

A linting tool for Chinese language.

TypeScript
981
1 个月前
Python
604
6 个月前

🌏 简体中文 GeoJSON 世界地图,带有国家(地区)的 ISO 3166 代码、中文简称与全称。A simplified Chinese world map in GeoJSON format, including ISO 3166 codes, Chinese short names, and full names of countries (regions).

HTML
288
2 个月前

A framework for cleaning Chinese dialog data

Python
274
4 年前

Learn, read, write and practice Mandarin by drawing strokes in Anki Desktop, AnkiDroid and AnkiMobile with audio of HSK 2.0 (HSK1-6) and HSK 3.0 (HSK 1-9) characters.

HTML
253
4 天前

收集非普通話漢語和古漢語的中州韻輸入法拼音方案 Collection of phonetic spelling schemas for Sinitic languages and dialects

Shell
207
2 天前

Discovering magic squares in Tang Dynasty poems

C
189
4 年前

中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.

HTML
182
1 年前

Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

Python
167
8 个月前

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

Python
138
1 年前

Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark

Python
132
2 年前

A webapp to visualize relationships among Chinese characters and to see example sentences that illustrate their use. Also available for Japanese learners.

JavaScript
108
3 天前

Từ điển tiếng Việt dành cho máy đọc sách Kindle, Kobo, Pocketbook v.v.

Python
77
1 个月前