Repository navigation

#

corpus-tools

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python
4595
12 天前

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

Python
731
5 小时前

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Macaulay2
263
2 年前
Python
170
2 个月前
Python
138
2 年前

OpusFilter - Parallel corpus processing toolkit

Python
109
14 天前

An advanced, extensible web front-end for the Manatee-open corpus search engine

TypeScript
73
2 天前

Utilities for Processing the Switchboard Dialogue Act Corpus

Python
70
5 年前

An open source reimplementation of Benny Brodda's BETA in Python

Python
62
6 年前

SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/

HTML
57
2 年前

A set of workflows for corpus building through OCR, post-correction and normalisation

Python
50
3 年前

Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.

Python
50
5 个月前

Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework

Python
47
2 年前

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

PHP
41
2 年前

Utilities for Processing the Meeting Recorder Dialogue Act Corpus

Python
33
5 年前