Repository navigation

#

page-xml

Document Layout Analysis resources repos for development with PdfPig.

C#
611
2 年前

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

JavaScript
188
2 个月前

Conversions between various OCR formats

75
2 年前

An OCR evaluation tool

Python
65
3 天前

Convert Transkribus PAGE-XML to standard PAGE-XML

Python
12
10 个月前

NLP-helper for OCR-ed pages in PAGE XML format

Python
10
4 个月前

A powerful CLI tool for visualization and encoding of PAGE-XML files

Python
6
4 年前

Convert AWS Textract JSON to PRImA PAGE XML

Python
6
3 个月前

Data for layout analysis and HTR.

Python
4
4 年前

Dataset and models for catalogs' Layout analysis and HTR

Python
2
4 年前

Automatically re-order lines, words and glyphs to become textually consistent with their parents.

Python
2
1 年前

About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

1
10 个月前

The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

0
10 个月前

The repo gt_structure_1_2 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

0
10 个月前

The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.

0
5 天前

The repo gt_structure_1_1 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

0
10 个月前