Repository navigation

#

document-processing

enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

Python
1193
3 天前

Generic framework for historical document processing

Python
375
4 年前

A full-featured Document Management Platform / Document Layer for your application, providing storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. Please 🌟 star to support our work!

Java
126
16 小时前

A Python framework for multi-modal document understanding with Amazon Bedrock

Python
81
4 天前

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

Python
73
5 个月前

Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

Python
68
1 个月前
Haskell
62
4 年前

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.

JavaScript
37
16 天前

Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)

C++
36
2 年前

Semantic extraction from conference proceedings.

Python
31
5 年前

An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.

Python
27
8 个月前

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

Python
22
5 年前

tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.

Clojure
18
5 年前

ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaTeX syntax.

JavaScript
15
8 天前

A module for creating stopword lists for any language, based on a set of documents.

JavaScript
14
7 个月前

ETL for RAG. Transform any source into LLM-ready markdown. Focus on your AI, not integrations.

TypeScript
11
2 个月前