Repository navigation

#

document-parsing

enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

Python
1194
4 天前

Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers

TypeScript
275
1 个月前

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

Python
50
1 个月前

A Unified Toolkit for Deep Learning-Based Table Extraction

Python
34
5 个月前

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

27
2 年前

Applicant Tracking System (ATS): A powerful platform leveraging generative AI and soft-match algorithms to analyze resumes against job descriptions. Built with React and Node.js, it streamlines hiring insights. Future plans include expanding to investor pitches and other structured documents.

JavaScript
3
5 天前

Docparser OCR Package for PHP Laravel

PHP
3
2 个月前

Docling4j brings the functionalities of Docling in document understanding to Java® projects

Java
2
20 天前

Combining OCR for text extraction with LLMs for accurate, efficient document structuring.

Python
1
2 天前

Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.

Python
1
2 个月前

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

Python
1
6 个月前

Parsing Documents to one datatype (Typescript port of Docling) (NOT STARTED!)

1
5 个月前

PhraseSpeaker: Effortlessly dictate specific sections of text files with macOS's text-to-speech. Perfect for navigating and audibly extracting key content from large documents!

Shell
1
1 年前

Repository for testing and demonstrating the capabilities of Docling for document conversion.

HTML
0
3 个月前