Repository navigation

#

text-processing

⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨

Shell
10203
1 年前

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Python
7880
1 年前
pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python
7835
8 小时前

Intuitive find & replace CLI (sed alternative)

Rust
6443
4 个月前

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Python
3134
2 年前
chonkie-ai/chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

Python
2872
5 个月前
Python
2375
11 天前
pemistahl/lingua-go
Go
1271
6 个月前

Program to convert lines of text into a tree structure.

Go
1199
2 年前

A fast and convenient fuzzy matcher library for rust

Rust
1194
3 个月前

A fast implementation of Aho-Corasick in Rust.

Rust
1131
1 年前
Rust
722
2 个月前
Python
707
25 天前

A simple Python module for parsing human names into their individual components

Python
683
1 年前

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Python
670
3 个月前

Natural language detection library for Go

Go
665
2 年前

Open Korean Text Processor - An Open-source Korean Text Processor

Scala
641
1 年前