Repository navigation

#

corpus-linguistics

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

Python
731
24 天前

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

541
3 年前

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

491
10 个月前

A list of Indonesian NLP resources.

285
4 年前

A web-based engine for creating and annotating textual corpora

PHP
247
2 年前

Crawler for linguistic corpora

Python
205
1 天前

Kanji usage frequency data collected from various sources

Astro
145
9 天前

Data for the quantitative study of (Vedic) Sanskrit

Python
127
7 天前

Quran, Hadith, Translations, Tafaseer, Corpus Linguistics. Everything for NLP

Jupyter Notebook
103
1 年前

An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.

Go
86
4 年前

An advanced, extensible web front-end for the Manatee-open corpus search engine

TypeScript
73
13 天前

Large silver standart Russian corpus with NER, morphology and syntax markup

Python
69
2 年前

A textual corpus database for the digital humanities.

Jupyter Notebook
61
5 年前

A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。

Jupyter Notebook
61
4 年前

SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/

HTML
57
2 年前

CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates

51
2 年前