Repository navigation

#

hocr

A Gtk/Qt front-end to tesseract-ocr.

C++
1830
1 个月前

Document Layout Analysis resources repos for development with PdfPig.

C#
624
2 年前

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

JavaScript
216
7 天前

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

JavaScript
195
4 个月前

Conversions between various OCR formats

79
2 年前

Text Overlay plugin for Mirador 3

JavaScript
59
3 天前

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

XSLT
56
10 天前

Ergonomic line-by-line transcription of scanned text.

JavaScript
53
5 年前

Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF

Python
18
4 年前

✏️ Integration of Tesseract for Python using a shared library

Python
12
10 年前

A visual hOCR file editor

TypeScript
9
2 年前
JavaScript
4
3 年前

A visual editor for .hocr files.

C#
4
8 个月前

Some basic data and text extraction from the New York City Directories

4
8 年前

CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP

Jupyter Notebook
3
7 年前

The data for guides to breweries across the United States from 1896 to 1918

3
8 年前

Python parser for hOCR files using lxml

Python
3
5 年前