Repository navigation

#

chinese-language

Python
11157
1 年前

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3825
8 天前

Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。

997
1 年前

A linting tool for Chinese language.

TypeScript
975
3 个月前
Python
579
2 个月前

A framework for cleaning Chinese dialog data

Python
269
4 年前

🌏 简体中文 GeoJSON 世界地图,带有国家(地区)的 ISO 3166 代码、中文简称与全称。A simplified Chinese world map in GeoJSON format, including ISO 3166 codes, Chinese short names, and full names of countries (regions).

HTML
249
5 个月前

Learn, read, write and practice Mandarin by drawing strokes in Anki Desktop, AnkiDroid and AnkiMobile with audio of HSK 2.0 (HSK1-6) and HSK 3.0 (HSK 1-9) characters.

HTML
224
8 个月前

收集非普通話漢語和古漢語的中州韻輸入法拼音方案 Collection of phonetic spelling schemas for Sinitic languages and dialects

Shell
197
3 小时前

Discovering magic squares in Tang Dynasty poems

C
188
4 年前

中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.

HTML
168
1 年前

Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

Python
159
4 个月前

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

Python
136
8 个月前

Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark

Python
129
2 年前

A webapp to visualize relationships among Chinese characters and to see example sentences that illustrate their use. Also available for Japanese learners.

JavaScript
89
1 个月前

Từ điển tiếng Việt dành cho máy đọc sách Kindle, Kobo, Pocketbook v.v.

Python
74
3 个月前