Repository navigation

#

multi-modal

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python
19252
2 个月前

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Python
8537
19 天前
modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Python
7725
5 天前

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python
7627
2 天前

a state-of-the-art-level open visual language model | 多模态预训练模型

Python
6483
1 年前

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python
5608
1 年前

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

Python
4152
8 个月前

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook
3945
2 个月前

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python
3229
5 个月前

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

C#
3121
2 天前

GPT4V-level open-source multi-modal model based on Llama3-8B

Python
2336
2 个月前

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python
2232
4 小时前

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python
2151
2 个月前