Repository navigation

#

ocr-python

hiroi-sora/Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Python
32348
25 天前

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

Python
2541
4 天前

结束和新的开始

QML
940
1 年前

A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.

C++
178
2 年前

Perform text detection in a variety of languages with your computer webcam using Google Tesseract OCR and OpenCV. This script achieves a real-time OCR effect via multi-threading.

Python
162
2 年前

Lightweight & fast OCR models for license plate text recognition.

Python
136
4 个月前

Anansi is a computer vision (cv2 and FFmpeg) + OCR (EasyOCR and tesseract) python-based crawler for finding and extracting questions and correct answers from video files of popular TV game shows in the Balkan region.

Python
125
3 年前

Manga OCR snipping application for desktop

Python
114
2 年前

A FLOSS software for Persian Optical Character Recognition

Jupyter Notebook
89
10 个月前

PDF text data extraction web app with OCR for scanned documents

Python
87
10 个月前

Easter2.0: IMPROVING CONVOLUTIONAL MODELS FOR HANDWRITTEN TEXT RECOGNITION

Jupyter Notebook
79
2 年前

Python3 package for Chinese/English OCR, with paddleocr-v4 onnx model(~14MB). 基于ppocr-v4-onnx模型推理,可实现 CPU 上毫秒级的 OCR 精准预测,通用场景中英文OCR达到开源SOTA。

Python
74
3 个月前

Collection of PDF parsing libraries like AI based docling, claude, openai, llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.

Python
61
2 天前

Turn any OCR models into online inference API endpoint 🚀 🌖

Python
55
1 个月前

MyLittleOCR 是一个统一的 OCR 库包装器,提供一致的 API,便于集成和切换多个 OCR 引擎。 MyLittleOCR is a unified OCR wrapper providing a consistent API for seamless integration and switching between multiple OCR engines.

Python
51
7 个月前

Multimodal document parser for high quality data understanding and extraction

Python
43
2 天前