Repository navigation

#

gpt-4-vision

【新增PDF和Office文件解析上传】安卓端全场景GPT助手,可用音量键唤起并进行语音交流,支持联网、拍照、模板、PDF和Office文件解析等 | GPT assistant for Android, activated via volume keys for voice interaction, supporting features such as networking, taking photos, templates and parsing PDF and Office documents.

Java
851
3 个月前
Jupyter Notebook
520
2 年前

SGPT is a command-line tool that provides a convenient way to interact with OpenAI models, enabling users to run queries, generate shell commands and produce code directly from the terminal.

Go
363
16 小时前

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Python
329
1 年前

AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more

TypeScript
293
1 年前

AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.

JavaScript
290
1 年前

Convert a screenshot to a working Flutter app.

Dart
212
5 个月前

A versatile multi-modal chat application that enables users to develop custom agents, create images, leverage visual recognition, and engage in voice interactions. It integrates seamlessly with local LLMs and commercial models like OpenAI, Gemini, Perplexity, and Claude, and allows to converse with uploaded documents and websites.

C#
133
1 年前

Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦

Jupyter Notebook
63
2 年前

A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.

Python
60
9 个月前
Jupyter Notebook
59
2 年前

GPT 4 Turbo Vision with Chainlit

Python
32
2 年前

Language instructions to mycobot using GPT-4V

Python
24
2 年前

This sample project integrates OpenAI's GPT-4 Vision, with advanced image recognition capabilities, and DALL·E 3, the state-of-the-art image generation model, with the Chat completions API. This powerful combination allows for simultaneous image creation and analysis.

JavaScript
24
2 年前

Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.

23
9 个月前

Using Azure OpenAI deployment of GPT-4 Turbo with Vision to analyse out-of-stock situation in a fictitious retail shop.

Python
20
2 年前

This tool offers an interactive way to analyze and understand your screenshots using OpenAI's GPT-4 Vision API. Capture any part of your screen and engage in a dialogue with ChatGPT to uncover detailed insights, ask follow-up questions, and explore visual data in a user-friendly format.

Python
19
1 年前