Repository navigation

#

image-to-text

A wrapper to work with Tesseract OCR inside PHP.

PHP
2994
5 个月前

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Python
1165
2 年前
killkimno/MORT

MORT 번역기 프로젝트 - Real-time game translator with OCR

C#
1161
1 个月前

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Python
687
13 天前

Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.

Python
538
5 个月前

A Node.js wrapper for the Tesseract OCR API

JavaScript
312
2 年前

Data release for the ImageInWords (IIW) paper.

JavaScript
216
9 个月前

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Python
171
1 年前

Codebase for fine-tuning / evaluating nougat-based image2latex generation models

Python
156
1 年前

The module extracts text from image using the tesseract-OCR engine. Generally, text present in the images are blur or are of uneven sizes. The image is pre-processed for better comprehension by OCR. This module first makes bounding box for text in images and then normalizes it to 300 dpi, suitable for OCR engine to read.

Python
147
6 年前

A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

Python
142
1 个月前

A flutter package for Fast, Accurate and Secure Credit card & Debit card scanning

Swift
126
7 个月前

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

Jupyter Notebook
112
3 年前

Notepad is multi module Jetpack compose note taking app with sketch pad, voice recorder, image capturing app

Kotlin
112
8 天前
Python
108
5 个月前

Everything is very simple: you either download a picture file or specify its link when running a python script, and output you get a text file, and you can immediately view on the command line how it will look the result of your conversion.

Python
105
2 年前

To extract details from Indian National Identification Cards such as PAN (completed) & Aadhar, Passport, Driving License (WIP) in a structured format

Python
81
6 年前

OCR with Google's AI technology (Cloud Vision API)

Python
77
2 年前