Repository navigation

#

pdf-to-json

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

HTML
12817
8 天前

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

Python
696
24 天前

PDF Verse is a powerful web based PDF Editor with tools for editing, converting, and manipulating PDFs. Merge, compress, add or remove pages, or extract text using OCR technology. Convert PDF to DOC, Excel, PPT, JPG, PNG, Text and many more format as well and vice versa. PDF Verse also has user-friendly interface and wide range of features as well

JavaScript
247
2 年前

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

Jupyter Notebook
116
3 年前

Sao kê của Mặt Trận Tổ Quốc Việt Nam (MTTQ) về việc hỗ trợ đồng bào sau bão Yagi

JavaScript
25
1 年前

Docling4j brings the functionalities of Docling in document understanding to Java® projects

Java
17
6 个月前

Quick way to convert files (PDF, DOCX, HTML, PPTX, Images) to (MD, JSON, YAML) using Docling and Streamlit

Python
13
3 个月前

NodeJS library to convert JSON to PDF or vice versa

JavaScript
9
2 年前

A cute PDF parser that gives position of elements for inspection purposes.

TypeScript
8
10 个月前

This project for converting books from PDF to Proper JSON objects by separating title and content. After you take your output, you can insert your JSON file in the database easily.

JavaScript
5
7 年前

🛠️ ipuresult-cli is tool for creating json files from pdf result files 📚 of GGSIPU Results

JavaScript
2
5 天前