Repository navigation

#

multi-modal

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python
22022
10 天前

AgentScope: Agent-Oriented Programming for Building LLM Applications

Python
12877
1 天前

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python
9285
13 天前

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Python
8853
6 天前
modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Python
8376
3 天前

Open-source framework for conversational voice AI agents

C
8256
4 天前

a state-of-the-art-level open visual language model | 多模态预训练模型

Python
6669
1 年前

Implementation of all RAG techniques in a simpler way

Jupyter Notebook
6181
3 个月前

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python
5628
2 年前

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook
4277
4 个月前

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

Python
4161
1 年前

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

C#
3374
1 天前

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python
3367
10 个月前

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python
3122
7 天前