Repository navigation

#

chunking

chonkie-ai/chonkie

🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

Python
2872
5 个月前

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Python
1898
3 年前
C
1527
2 年前

An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration

Java
409
8 天前

Fully neural approach for text chunking

Python
368
4 个月前

Alternative casync implementation

Go
356
1 个月前

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

Python
352
7 天前

A package for parsing PDFs and analyzing their content using LLMs.

Python
273
1 年前

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.

Python
262
4 个月前

A TensorFlow implementation of Neural Sequence Labeling model, which is able to tackle sequence labeling tasks such as POS Tagging, Chunking, NER, Punctuation Restoration and etc.

Python
234
7 年前

A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.

Python
204
7 个月前

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

JavaScript
109
1 个月前

An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during runtime.

Python
94
2 年前

Live TS segmenter and HLS manifest creation in Go

Go
94
4 年前

webpack 2, react hotloader 3, react router v4, code splitting and more

JavaScript
85
8 年前

📑 Split Laravel jobs into multiple separate job chunks

PHP
84
1 年前

An asynchronous event-driven HTTP client based on netty.

Java
84
3 年前

Postgres extensions to support end-to-end Retrieval-Augmented Generation (RAG) pipelines

Rust
83
3 个月前

Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.

JavaScript
75
5 年前