Repository navigation

#

hocr

A Gtk/Qt front-end to tesseract-ocr.

C++
1738
10 天前

Document Layout Analysis resources repos for development with PdfPig.

C#
611
2 年前

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

JavaScript
188
2 个月前

Conversions between various OCR formats

75
2 年前

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

XSLT
55
9 个月前

Text Overlay plugin for Mirador 3

JavaScript
54
1 个月前

Ergonomic line-by-line transcription of scanned text.

JavaScript
51
4 年前

Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF

Python
18
4 年前

✏️ Integration of Tesseract for Python using a shared library

Python
12
9 年前

A visual hOCR file editor

TypeScript
10
1 年前
JavaScript
4
2 年前

A visual editor for .hocr files.

C#
4
2 个月前

Some basic data and text extraction from the New York City Directories

4
8 年前

The data for guides to breweries across the United States from 1896 to 1918

3
8 年前

Python parser for hOCR files using lxml

Python
3
5 年前

A gem that parses positional text from hOCR output and provides convenience methods to find text.

Ruby
3
3 年前

CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP

Jupyter Notebook
3
6 年前