Repository navigation
document-layout-analysis
- Website
- Wikipedia
A Unified Toolkit for Deep Learning Based Document Image Analysis
A curated list of resources for Document Understanding (DU) topic
Document Layout Analysis resources repos for development with PdfPig.
📚 Process PDFs, Word documents and more with spaCy
Document Layout Analysis
Page to PAGE Layout Analysis Tool
ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
A Bottom-Up Instance Segmentation Strategy for segmenting document instances using Transformers
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
Tools for extract figure, table, text, .. from a pdf document.
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
BoundaryNet - A Semi-Automatic Layout Annotation Tool
A step-by-step C# implementation of the Docstrum algorithm
Simple docker deployment of document layout analysis using detectron2
Using a MaskRCNN model trained on the PublayNet dataset with ML.Net in C# / .Net for Document layout analysis and page segmmentation task.
GloSAT Historical Measurement Table Dataset