Repository navigation

ngram

Website
Wikipedia

zhezhaoa / ngram2vec

Four word embedding models implemented in Python. Supporting arbitrary context features

ngram word2vec embedding chinese glove svd word word-embedding

Python

851

172

6 年前

lonePatient / albert_pytorch

A Lite Bert For Self-Supervised Learning Language Representations

albert bert PyTorch ngram mask 自然语言处理 language-model

Python

720

149

5 年前

wintermute-cell / ngrrram

A TUI tool to help you type faster and learn new layouts. Includes a free cat.

cat 命令行界面 colemak dvorak layout ngram Rust touchtyping tui typing

Rust

666

10 个月前

ranelpadon / ngram-type

Touch typing trainer using N-grams as data source, with options to customize the auto-generated lessons and specify the minimum typing performance needed. There are sound/color effects as well.

ngram colemak dvorak Vue.js monkeytype

JavaScript

243

1 年前

lonePatient / daguan_2019_rank9

datagrand 2019 information extraction competition rank9

bert ie information-extraction ner lstm crf span dropout lookahead PyTorch ngram

Python

130

6 年前

proycon / colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

C++Python 自然语言处理 ngrams skipgram ngram corpus Library text-processing computational-linguistics pattern-recognition

C++

129

10 个月前

ChrisMuir / refinr

Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms

openrefine fuzzy-matching ngram approximate-string-matching data-cleaning clustering R rstats

C++

104

2 年前

joshualoehr / ngram-language-model

Python implementation of an N-gram language model with Laplace smoothing and sentence generation.

ngram perplexity 自然语言处理 language-model Python ngrams language-models

Python

8 年前

words / n-gram

Get n-grams from text

ngram unigram

JavaScript

3 年前

vickumar1981 / stringdistance

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..

levenshtein-distance levenshtein ngram jaro jaro-winkler dice-coefficient hamming-distance string-similarity cosine-similarity fuzzy-matching Hacktoberfest

Scala

3 年前

wrathematics / ngram

Fast n-Gram Tokenization

R ngram text text-mining

2 年前

jiangnanboy / llm_corpus_quality

大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning

Java 大语言模型 ngram

Java

1 年前

suggest-go / suggest

Top-k Approximate String Matching.

golang-library ngram fuzzy-search search-engine language-model spellchecker autocomplete

4 年前

BitSpeech / SRILM

Mirror of SRILM

language-model ngram

Roff

5 年前

myazi / NLP

natural language processing

ngram crf

C++

7 年前

JackHCC / Chinese-Tokenization

利用传统方法（N-gram，HMM等）、神经网络方法（CNN，LSTM等）和预训练方法（Bert等）的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】

hmm-viterbi-algorithm ngram 自然语言处理 tokenization

Python

3 年前