Repository navigation
corpus-linguistics
- Website
- Wikipedia
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
A web-based engine for creating and annotating textual corpora
data resource untuk NLP bahasa indonesia
🕷 The pipeline for the OSCAR corpus
Kanji usage frequency data collected from various sources
Data for the quantitative study of (Vedic) Sanskrit
Quran, Hadith, Translations, Tafaseer, Corpus Linguistics. Everything for NLP
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
An advanced, extensible web front-end for the Manatee-open corpus search engine
Large silver standart Russian corpus with NER, morphology and syntax markup
A textual corpus database for the digital humanities.
A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates