Repository navigation

#

llava

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Go
150512
2 小时前

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python
23330
1 年前
Python
17021
13 分钟前

SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.

Python
5187
3 个月前

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Python
4700
7 天前
Jupyter Notebook
3615
15 天前

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

C#
3323
3 天前

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python
2921
6 天前

ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利

2034
2 年前

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

Python
1581
7 小时前

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Python
1422
15 天前
unum-cloud/uform

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Python
1161
2 个月前

Tag manager and captioner for image datasets

Python
1089
3 个月前
Markdown
984
6 个月前
Python
875
4 个月前

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Python
853
11 天前

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python
840
15 天前

OpenCV+YOLO+LLAVA powered video surveillance system

Python
771
16 天前

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Python
687
13 天前