Repository navigation

#

multi-modal

MiniCPM-V 4.0: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python
20087
8 天前

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Python
8780
13 天前

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python
8775
1 个月前
modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Python
8264
2 天前

Start building LLM-empowered multi-agent applications in an easier way.

Python
7742
13 小时前

Open-source framework for conversational voice AI agents.

C
7171
6 小时前

a state-of-the-art-level open visual language model | 多模态预训练模型

Python
6644
1 年前

Implementation of all RAG techniques in a simpler way

Jupyter Notebook
6181
2 个月前

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python
5627
2 年前

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook
4240
2 个月前

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

Python
4162
1 年前

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python
3339
9 个月前

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

C#
3323
3 天前

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python
2921
6 天前