Repository navigation

#

image-to-text

A wrapper to work with Tesseract OCR inside PHP.

PHP
2954
1 个月前

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Python
1129
1 年前

MORT 번역기 프로젝트 - Real-time game translator with OCR

C#
828
4 小时前

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Python
621
3 天前

Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.

Python
499
25 天前

A Node.js wrapper for the Tesseract OCR API

JavaScript
310
2 年前

Data release for the ImageInWords (IIW) paper.

JavaScript
209
5 个月前

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Python
159
1 年前

The module extracts text from image using the tesseract-OCR engine. Generally, text present in the images are blur or are of uneven sizes. The image is pre-processed for better comprehension by OCR. This module first makes bounding box for text in images and then normalizes it to 300 dpi, suitable for OCR engine to read.

Python
147
6 年前

Codebase for fine-tuning / evaluating nougat-based image2latex generation models

Python
146
7 个月前

A flutter package for Fast, Accurate and Secure Credit card & Debit card scanning

Swift
122
3 个月前
Python
108
19 天前

OCR functionality in a feature-rich note-taking extension.

TypeScript
100
4 个月前

Notepad is multi module Jetpack compose note taking app with sketch pad, voice recorder, image capturing app

Kotlin
99
1 个月前

A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

Python
94
24 天前

Everything is very simple: you either download a picture file or specify its link when running a python script, and output you get a text file, and you can immediately view on the command line how it will look the result of your conversion.

Python
90
2 年前

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

Jupyter Notebook
80
2 年前

To extract details from Indian National Identification Cards such as PAN (completed) & Aadhar, Passport, Driving License (WIP) in a structured format

Python
80
5 年前

OCR with Google's AI technology (Cloud Vision API)

Python
73
2 年前