Repository navigation

#

gpt4v

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Python
5741
1 个月前

Vision utilities for web interaction agents 👀

Jupyter Notebook
1645
5 个月前

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python
453
3 个月前

Lightweight GPT-4 Vision processing over the Webcam

JavaScript
281
1 年前

Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

249
1 年前

Convert different model APIs into the OpenAI API format out of the box.

Go
150
1 年前

GPT-4V in Wonderland: LMMs as Smartphone Agents

Python
134
9 个月前

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

Python
115
15 天前

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

Python
79
1 年前

The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam

JavaScript
79
1 年前

Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description

TypeScript
75
1 年前

Monitor the performance of OpenAI's GPT O3 Mini model over time.

HTML
34
4 天前

Video Voiceover with gpt-4o-mini

Jupyter Notebook
33
7 个月前

This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.

Python
25
4 个月前

Language instructions to mycobot using GPT-4V

Python
23
1 年前