Repository navigation
corpus-processing
- Website
- Wikipedia
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Bitextor generates translation memories from multilingual websites
Python scripts preprocessing Penn Treebank and Chinese Treebank
OpusFilter - Parallel corpus processing toolkit
Utilities for Processing the Switchboard Dialogue Act Corpus
A Serverless Text Annotation Tool for Corpus Development
A parser for annotated MuseScore 3 files.
Reading the data from OPIEC - an Open Information Extraction corpus
Utilities for Processing the Meeting Recorder Dialogue Act Corpus
A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Spanish poetry
Korpuslinguistik war noch nie so einfach...
A library of functions enabling complex corpus search in context (KWIC), search aggregation, bag-of-words building & keyphrase extraction.
Hard-Forked from JuliaText/TextAnalysis.jl
ALvisNLP corpus processing engine
Measure the similarity of text corpora for 74 languages
Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual interaction.
Scripts for building a geo-located web corpus using Common Crawl data
A set of corpus-based sampling & analysis M4L devices
Script that sets up and configures an entire CQPweb server installation