Repository navigation
document-parser
- Website
- Wikipedia
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Get your documents ready for gen AI
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Knowledge Agents and Management in the Cloud
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Improved file parsing for LLM’s
LAYRA is a ready-to-use visual RAG system with a complete web UI built with Next.js and FastAPI, preserving document layout, tables, paragraphs, and graphical elements without any structural fragmentation.
Parse PDFs into markdown using Vision LLMs
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
Tutorial on how to deskew (straighten) text images
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.
The invoice, document, and resume parser powered by AI.
An OCR based document parser to extract information from identity document images
An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).
Resume Parsing app to extract information using AI
Python client library for Graphlit Platform
Build a RAG preprocessing pipeline