Repository navigation

vlm-ocr

Website
Wikipedia

bytedance / Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

document-analysis layout-analysis OCR Parser pdf pdf-converter pdf-parser Python vlm-ocr

Python

7286

582

5 天前

vlm-run / vlmrun-hub

A hub for various industry-specific schemas to be used with VLMs.

人工智能机器视觉 etl genai JSON multimodal pydantic vlm vlm-ocr

Python

535

4 个月前

video-db / ocr-benchmark

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

arxiv benchmark easyocr OCR rapidocr research-paper vlm-ocr vlms

Python

8 个月前

OmarSamirz / ImageFromTextGenerator

IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

Image OCR synthetic text data-augmentation dataset-generation 图像处理 synthetic-data synthetic-data-generation noise optical-character-recognition augmentation vlm-ocr

Python

6 个月前

Niraya666 / DocuLingo

DocuLingo is a powerful document parsing tool built with multimodal large language models to enhance RAG (Retrieval Augmented Generation) workflows.

rag vlm-ocr

Python

5 个月前

Takk8IS / CyberTechVLMDetector

The CyberTech VLM Detector is a computer vision system designed to run entirely on edge devices, without requiring cloud access. The system uses vision-language models (VLM) to detect and locate objects in images based on natural language commands and development, including my creation of HIM™ and MAIC™

camera detector Python read view vlm vlm-ocr vlms

Python

2 个月前