Repository navigation
bpe
- Website
- Wikipedia
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Unsupervised text tokenizer focused on computational efficiency
Fast and customizable text tokenization library with BPE and SentencePiece support
Explains nlp building blocks in a simple manner.
Train a language model to chat like you using your personal conversations from WhatsApp, Telegram, Signal, or other platforms.
nfelib - bindings Python para e ler e gerir XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e
Fast bare-bones BPE for modern tokenizer training
Machine Learning for Phishing Website Detection
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Simple-to-use scoring function for arbitrarily tokenized texts.
GPT3 encoder & decoder tool written in Swift
Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
High performance unsupervised text tokenization for Ruby