Repository navigation

#

ngram

Four word embedding models implemented in Python. Supporting arbitrary context features

Python
850
6 年前

A Lite Bert For Self-Supervised Learning Language Representations

Python
718
5 年前

A TUI tool to help you type faster and learn new layouts. Includes a free cat.

Rust
664
9 个月前

Touch typing trainer using N-grams as data source, with options to customize the auto-generated lessons and specify the minimum typing performance needed. There are sound/color effects as well.

JavaScript
240
1 年前

datagrand 2019 information extraction competition rank9

Python
130
6 年前

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

C++
128
8 个月前

Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms

C++
104
1 年前

Python implementation of an N-gram language model with Laplace smoothing and sentence generation.

Python
86
8 年前

Get n-grams from text

JavaScript
83
3 年前

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..

Scala
80
3 年前

Fast n-Gram Tokenization

C
70
2 年前

大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning

Java
69
1 年前

Mirror of SRILM

Roff
57
5 年前

natural language processing

C++
37
7 年前

利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】

Python
35
3 年前

Create n-grams of wordlists based on words, characters, or charsets to use in offline password attacks and data analysis

Python
34
1 年前

Calculating Ngram with PySpark for wikipedia text

Jupyter Notebook
29
1 年前

Spider - web crawler and local wordlist processor to generate frequency sorted wordlist / ngrams

Go
25
20 天前