Repository navigation
gpt-4-vision
- Website
- Wikipedia
【新增PDF和Office文件解析上传】安卓端全场景GPT助手,可用音量键唤起并进行语音交流,支持联网、拍照、模板、PDF和Office文件解析等 | GPT assistant for Android, activated via volume keys for voice interaction, supporting features such as networking, taking photos, templates and parsing PDF and Office documents.
High quality resources & applications for LLMs, multi-modal models and VectorDBs
The most advanced Web UI for AI chat
Cool experiments at the intersection of Computer Vision and Sports ⚽🏃
SGPT is a command-line tool that provides a convenient way to interact with OpenAI models, enabling users to run queries, generate shell commands and produce code directly from the terminal.
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
Convert a screenshot to a working Flutter app.
A versatile multi-modal chat application that enables users to develop custom agents, create images, leverage visual recognition, and engage in voice interactions. It integrates seamlessly with local LLMs and commercial models like OpenAI, Gemini, Perplexity, and Claude, and allows to converse with uploaded documents and websites.
Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦
A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.
GPT-4 Vision Chatbot examples
ChatGPT wrapper in your TTY
GPT 4 Turbo Vision with Chainlit
Language instructions to mycobot using GPT-4V
This sample project integrates OpenAI's GPT-4 Vision, with advanced image recognition capabilities, and DALL·E 3, the state-of-the-art image generation model, with the Chat completions API. This powerful combination allows for simultaneous image creation and analysis.
Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
Using Azure OpenAI deployment of GPT-4 Turbo with Vision to analyse out-of-stock situation in a fictitious retail shop.
This tool offers an interactive way to analyze and understand your screenshots using OpenAI's GPT-4 Vision API. Capture any part of your screen and engage in a dialogue with ChatGPT to uncover detailed insights, ask follow-up questions, and explore visual data in a user-friendly format.