Repository navigation
ngram
- Website
- Wikipedia
Four word embedding models implemented in Python. Supporting arbitrary context features
A Lite Bert For Self-Supervised Learning Language Representations
Touch typing trainer using N-grams as data source, with options to customize the auto-generated lessons and specify the minimum typing performance needed. There are sound/color effects as well.
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms
Python implementation of an N-gram language model with Laplace smoothing and sentence generation.
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Top-k Approximate String Matching.
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】
Create n-grams of wordlists based on words, characters, or charsets to use in offline password attacks and data analysis
Calculating Ngram with PySpark for wikipedia text
Spider - web crawler and local wordlist processor to generate frequency sorted wordlist / ngrams
multiprocess unsupervised chinese_detect_words ngram_combination