Repository navigation

#

page-xml

Document Layout Analysis resources repos for development with PdfPig.

C#
623
2 年前

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

JavaScript
192
3 个月前

Conversions between various OCR formats

79
2 年前

An OCR evaluation tool

Python
66
3 个月前

Convert Transkribus PAGE-XML to standard PAGE-XML

Python
12
1 年前

NLP-helper for OCR-ed pages in PAGE XML format

Python
10
8 个月前

A powerful CLI tool for visualization and encoding of PAGE-XML files

Python
6
4 年前

Convert AWS Textract JSON to PRImA PAGE XML

Python
6
7 个月前

Data for layout analysis and HTR.

Python
4
4 年前

Dataset and models for catalogs' Layout analysis and HTR

Python
2
4 年前

Automatically re-order lines, words and glyphs to become textually consistent with their parents.

Python
2
2 年前

About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

1
1 年前

The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

0
1 年前

The repo gt_structure_1_2 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

0
1 年前

The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.

0
4 个月前

The repo gt_structure_1_1 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

0
1 年前