Repository navigation
text-processing
- Website
- Wikipedia
⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Intuitive find & replace CLI (sed alternative)
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Python library for creating PEG parsers
Text Classification Algorithms: A Survey
Persian NLP Toolkit
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Program to convert lines of text into a tree structure.
A fast implementation of Aho-Corasick in Rust.
A fast and convenient fuzzy matcher library for rust
Thai natural language processing in Python
A simple Python module for parsing human names into their individual components
All-in-one text de-duplication
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Natural language detection library for Go
Open Korean Text Processor - An Open-source Korean Text Processor