Repository navigation

gpt4v

Website
Wikipedia

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

agent ChatGPT generative-ai gpt4 gpt4v 大语言模型

Python

6089

679

5 个月前

AmberSahdev / Open-Interface

Control Any Computer Using LLMs.

gpt 大语言模型机器学习 macOS openai Python 自动化 assistant assistant-computer-control gpt4 gpt4v gpt4vision Linux pyautogui pyinstaller self-driving self-driving-software Windows

Python

2418

249

5 个月前

reworkd / tarsier

Vision utilities for web interaction agents 👀

OCR Playwright Selenium webscraping pypi-package gpt4v 大语言模型 Python

Jupyter Notebook

1718

112

9 个月前

ictnlp / LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

efficient gpt4o gpt4v large-language-models large-multimodal-models llava multimodal Video vision vision-language-model visual-instruction-tuning llama multimodal-large-language-models

Python

518

2 个月前

bdekraker / WebcamGPT-Vision

Lightweight GPT-4 Vision processing over the Webcam

ChatGPT 机器视觉 gpt-4 gpt4-api gpt4v openai

JavaScript

285

2 年前

langgptai / Awesome-Multimodal-Prompts

Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

ChatGPT gpt4 multimodal prompt-engineering prompts gpt4v newbing Awesome Lists prompt-injection dall-e

256

2 天前

ShareGPT4Omni / ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

ChatGPT gpt gpt-4v gpt4v instruction-tuning language-model large-language-models large-multimodal-models large-vision-language-models vision-language-model eccv2024

Python

231

1 年前

pAIrprogio / vscode-ui-sketcher

Draw your projects to life

gpt4v tldraw User interface design VS Code Extension

TypeScript

201

1 年前

soulteary / amazing-openai-api

Convert different model APIs into the OpenAI API format out of the box.

azure-openai azure-openai-api gemini-pro google-gemini openai openai-api gpt4v gpt4vision

158

1 年前

zzxslp / MM-Navigator

GPT-4V in Wonderland: LMMs as Smartphone Agents

gpt4v llm-agents

Python

134

1 年前

kyegomez / MambaByte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

人工智能 gpt4v 机器学习 mamba multi-modality Parsing

Python

122

2 天前

BUAADreamer / Chinese-LLaVA-Med

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

llava medical mllm multimodal chinese qwen1-5 人工智能 gpt4v minigpt4 transformers

Python

1 年前

cameronking4 / sketch2app

The ultimate sketch to code app made using GPT4o serving 30k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam

sketch2code wireframe gpt4v design2code code-generator gpt4 code-assistant Next openai

JavaScript

1 年前

admineral / GPT4-Vision-React-Starter

Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description

gpt4 gpt4-api gpt4v openai openaiapi 人工智能 ChatGPT API gpt-4-vision-preview openai-api

TypeScript

2 年前

reidbarber / webmarker

Mark web pages for use with vision-language models

prompt prompt-engineering som vision-language-model claude gemini gpt4o gpt4v 大语言模型 Playwright operator computer-use cua

TypeScript

3 个月前

roboflow / gpt-checkup

Monitor the performance of OpenAI's GPT O3 Mini model over time.

机器视觉 gpt4v o1

HTML

3 个月前

martintomov / gpt4v-video-voiceover

Video Voiceover with gpt-4o-mini

gpt4v openai Python Streamlit Jupyter Notebook

Jupyter Notebook

1 年前

Azure-Samples / rag-as-a-service-with-vision

This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.

azure-ai-search cosmosdb gpt-4o gpt4v gpt4vision 大语言模型 openai rag vision

Python

8 个月前