Repository navigation

hocr

Website
Wikipedia

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdfbox pdf pdf-document C#netstandard pdf-extractor pdf-document-processor pdf-files alto-xml hocr layout-analysis document-analysis page-xml pdf-generation

2223

287

3 天前

manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.

Qt OCR pdf-document C++tesseract-ocr GTK hocr scanner

C++

1830

206

1 个月前

mittagessen / kraken

OCR engine for all the languages

OCR neural-networks alto-xml hocr handwritten-text-recognition layout-analysis optical-character-recognition page-xml

Python

890

149

10 天前

BobLd / DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

document-layout-analysis layout-analysis table-extraction pdf C#hocr page-xml alto-xml

624

2 年前

scribeocr / scribeocr

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

OCR proofreading tesseract hocr

JavaScript

216

7 天前

UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

OCR hocr page-xml validation transformation

JavaScript

195

4 个月前

cneud / ocr-conversion

Conversions between various OCR formats

alto-xml hocr page-xml OCR

2 年前

dbmdz / mirador-textoverlay

Text Overlay plugin for Mirador 3

OCR optical-character-recognition hocr alto-xml

JavaScript

3 天前

filak / hOCR-to-ALTO

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

hocr

XSLT

10 天前

UB-Mannheim / ocr-gt-tools

Ergonomic line-by-line transcription of scanned text.

OCR hocr transcription ground-truth web-interface

JavaScript

5 年前

dmi3kno / hocr

Text-to-tibble

OCR tesseract tesseract-ocr R rstats hocr

5 年前

fakabbir / OCR

Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF

OCR hocr tesseract Python

Python

4 年前

macabeus / pyslibtesseract

✏️ Integration of Tesseract for Python using a shared library

tesseract hocr OCR

Python

10 年前

GeReV / hocr-editor-ts

A visual hOCR file editor

OCR hocr tesseract-ocr

TypeScript

2 年前

iilei / hocr-to-json

OCR hocr

JavaScript

3 年前

GeReV / HocrEditor

A visual editor for .hocr files.

hocr tesseract-ocr OCR

8 个月前

hadro / new-york-city-directories

Some basic data and text extraction from the New York City Directories

digital-humanities pdfs OCR hocr

8 年前

mayurcybercz / AI-Exam-evaluation

CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP

tesseract-ocr hocr 自然语言处理命令行界面 JSON Python nltk

Jupyter Notebook

7 年前

hadro / brewery-guides

The data for guides to breweries across the United States from 1896 to 1918

hocr data dataset digital-humanities Open Data

8 年前

jlieth / hocr-parser

Python parser for hOCR files using lxml

Python hocr OCR parsing-library

Python

5 年前