Repository navigation

#

vlm-ocr

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python
7286
5 天前

A hub for various industry-specific schemas to be used with VLMs.

Python
535
4 个月前

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

Python
44
8 个月前

IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

Python
16
6 个月前

DocuLingo is a powerful document parsing tool built with multimodal large language models to enhance RAG (Retrieval Augmented Generation) workflows.

Python
0
5 个月前

The CyberTech VLM Detector is a computer vision system designed to run entirely on edge devices, without requiring cloud access. The system uses vision-language models (VLM) to detect and locate objects in images based on natural language commands and development, including my creation of HIM™ and MAIC™

Python
0
2 个月前