Repository navigation

#

document-intelligence

Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.

Python
2276
4 天前
enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

Python
1373
17 天前

AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.

Jupyter Notebook
584
8 个月前

ReadingBank: A Benchmark Dataset for Reading Order Detection

108
1 年前

A collection of samples demonstrating techniques for processing documents with Azure AI including AI Foundry, OpenAI, Document Intelligence, etc.

Bicep
101
2 天前

The Doc Intelligence in-a-Box project leverages Azure AI Document Intelligence to extract data from PDF forms and store the data in a Azure Cosmos DB. This solution, part of the AI-in-a-Box framework by Microsoft Customer Engineers and Architects, ensures quality, efficiency, and rapid deployment of AI and ML solutions across various industries.

Bicep
37
7 个月前

This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.

Jupyter Notebook
31
1 年前

An experiment to provide the capabilities of Azure AI Document Intelligence Studio template training for feedback loop

Python
9
5 个月前

App used to extract structured data from documents photos or pdfs via custom templating and commercial LLM (GPT and Azure Document Intelligence). Developed as a Computer Science Thesis at University of Bologna

Python
1
1 个月前

Using Azure Document Intelligence and Azure OpenAI services to automatically extract data from invoices.

HTML
1
4 个月前