Repository navigation

#

layout-analysis

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python
42184
1 天前

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python
5579
7 天前

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

Jupyter Notebook
2537
1 个月前

Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.

Python
863
14 天前

Document Layout Analysis resources repos for development with PdfPig.

C#
623
2 年前

Analysis of Chinese and English layouts 中英文版面分析

Python
238
14 天前

📝 针对文档类图像做内容提取,将文档类图像一比一输出到Word或者Txt中,便于进一步使用或处理。后续计划支持输入PDF/图像,输出对应json格式、Txt格式、Word格式和Markdown格式。

Python
202
10 个月前

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

Python
132
17 天前
Jupyter Notebook
130
2 年前

An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"

Python
80
2 年前

基于paddleOCR的nodejs库

TypeScript
80
3 个月前

A Unified Toolkit for Deep Learning-Based Table Extraction

Python
45
9 个月前

[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)

Python
42
2 年前

A Large Dataset of Historical Japanese Documents with Complex Layouts

Jupyter Notebook
34
3 年前

Nordrassil is a keyboard layout that provides an elegant and balanced typing experience by its use of a thumb-alpha, emphasis on middle fingers, deprioritisation of pinkies, and repeat key (or arcane keys).

29
1 年前