Repository navigation
bpe
- Website
- Wikipedia
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Unsupervised text tokenizer focused on computational efficiency
Fast and customizable text tokenization library with BPE and SentencePiece support
Explains nlp building blocks in a simple manner.
nfelib - bindings Python para e ler e gerir XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e
Fast bare-bones BPE for modern tokenizer training
Machine Learning for Phishing Website Detection
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Simple-to-use scoring function for arbitrarily tokenized texts.
GPT3 encoder & decoder tool written in Swift
High performance unsupervised text tokenization for Ruby
Learning BPE embeddings by first learning a segmentation model and then training word2vec
Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.