Repository navigation

multi-modal

Website
Wikipedia

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

minicpm minicpm-v multi-modal

Python

19252

1391

2 个月前

activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

datasets 深度学习机器学习数据科学 PyTorch Tensorflow Python 人工智能 mlops 机器视觉 cv 图像处理 datalake langchain 大语言模型 large-language-models vector-database vector-search multi-modal

Python

8537

657

19 天前

modelscope / modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

自然语言处理 cv speech multi-modal science 深度学习机器学习 Python

Python

7725

797

5 天前

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification image-text-retrieval 大语言模型 semantic-segmentation video-classification vision-language-model vit-22b vit-6b multi-modal gpt gpt-4v gpt-4o

Python

7627

580

2 天前

modelscope / agentscope

Start building LLM-empowered multi-agent applications in an easier way.

agent 聊天机器人 gpt-4 large-language-models 大语言模型 llm-agent multi-agent distributed-agents multi-modal llama3 gpt-4o drag-and-drop mcp

Python

7081

406

5 天前

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

cross-modality language-model multi-modal pretrained-models visual-language-models

Python

6483

429

1 年前

lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

人工智能深度学习 attention-mechanism text-to-image transformers multi-modal

Python

5608

640

1 年前

OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

chinese 机器视觉 multi-modal-learning 自然语言处理 PyTorch vision-and-language-pre-training image-text-retrieval clip pretrained-models vision-language 深度学习 multi-modal contrastive-loss transformers coreml-models

Python

5097

493

8 个月前

marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

深度学习 information-retrieval 机器学习 vector-search tensor-search clip multi-modal search-engine transformers vision-language semantic-search visual-search 自然语言处理 hnsw knn Hacktoberfest ChatGPT gpt large-language-models

Python

4830

203

2 天前

valhalla / valhalla

Open Source Routing Engine for OpenStreetMap

OpenStreetMap dijkstra astar tiled directions isochrones multi-modal traveling-salesman routing-engine Routing (disambiguation)

C++

4780

726

12 小时前

modelscope / data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python

4222

227

1 天前

THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

chatglm-6b gpt multi-modal

Python

4152

424

8 个月前

VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

diffusion Image image-generation multi-modal image-edit

Jupyter Notebook

3945

340

2 个月前

zjunlp / DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

knowledge-graph relation-extraction chinese named-entity-recognition attribute-extraction low-resource document-level information-extraction PyTorch deepke ner 自然语言处理 few-shot prompt 深度学习 multi-modal

Python

3865

715

1 个月前

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

instruction-tuning large-vision-language-model multi-modal

Python

3229

233

5 个月前

SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

聊天机器人 gpt llama llamacpp 大语言模型 semantic-kernel llava multi-modal llama2 llama3 llama-cpp

3121

414

2 天前

docarray / docarray

Represent, send, store and search multimodal data

docarray 数据结构 multimodal cross-modal neural-search 深度学习 nested-data qdrant weaviate nearest-neighbor-search protobuf elasticsearch multi-modal semantic-search 机器学习 PyTorch FastAPI pydantic

Python

3042

233

16 小时前

THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

cogvlm pretrained-models language-model multi-modal

Python

2336

153

2 个月前

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt-4v large-language-models llava multi-modal openai vqa 大语言模型 openai-api qwen gpt 机器视觉 PyTorch gpt4 ChatGPT clip vit evaluation claude gemini

Python

2232

333

4 小时前

dvlab-research / LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

large-language-model 大语言模型 multi-modal segmentation

Python

2151

151

2 个月前