Repository navigation

#

hocr

A Gtk/Qt front-end to tesseract-ocr.

C++
1802
1 天前

Document Layout Analysis resources repos for development with PdfPig.

C#
623
2 年前

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

JavaScript
202
2 天前

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

JavaScript
192
3 个月前

Conversions between various OCR formats

79
2 年前

Text Overlay plugin for Mirador 3

JavaScript
57
8 天前

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

XSLT
55
3 个月前

Ergonomic line-by-line transcription of scanned text.

JavaScript
53
5 年前

Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF

Python
18
4 年前

✏️ Integration of Tesseract for Python using a shared library

Python
12
9 年前

A visual hOCR file editor

TypeScript
10
1 年前

A visual editor for .hocr files.

C#
5
6 个月前
JavaScript
4
3 年前

Some basic data and text extraction from the New York City Directories

4
8 年前

CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP

Jupyter Notebook
3
7 年前

The data for guides to breweries across the United States from 1896 to 1918

3
8 年前

Python parser for hOCR files using lxml

Python
3
5 年前