Repository navigation

multimodal

Website
Wikipedia

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

rag lmstudio localai vector-database ollama local-llm llama3 大语言模型 ai-agents multimodal custom-ai-agents deepseek mcp mcp-servers 无代码 qwen3 web-scraping kimi moonshot

JavaScript

49661

5183

1 天前

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

gpt-4 聊天机器人 ChatGPT llama multimodal llava foundation-models instruction-tuning multi-modality visual-language-learning llama-2 llama2 vision-language-model

Python

23656

2634

1 年前

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

自然语言处理 pre-trained-model unilm minilm layoutlm layoutxlm beit document-ai trocr beit-3 foundation-models xlm-e deepnet 大语言模型 multimodal mllm kosmos kosmos-1 textdiffuser bitnet

Python

21754

2662

3 个月前

jina-ai / serve

☁️ Build multimodal AI applications with cloud-native stack

neural-search cloud-native 深度学习机器学习框架 gRPC Kubernetes multimodal mlops pipeline FastAPI generative-ai Docker jaeger llmops OpenTelemetry cncf 微服务 orchestration prometheus

Python

21752

2234

6 个月前

bytedance / UI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

agent vlm vision computer-use mcp mcp-server gui-operator browser-use gui-agent multimodal tars ui-tars agent-tars

TypeScript

19070

1867

4 天前

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

any-to-any foundation-models 大语言模型 multimodal vision-language-pretraining unified-model

Python

17558

2242

8 个月前

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation speaker-recognition asr tts generative-ai multimodal 深度学习 neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models

Python

15804

3121

4 小时前

mediar-ai / screenpipe

AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording

人工智能机器视觉大语言模型机器学习 multimodal vision agents agi

TypeScript

15722

1219

1 个月前

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).

大语言模型 lora llama sft multimodal peft internvl liger deepseek-r1 embedding grpo open-r1 megatron llama4 qwen3 reranker moe

Python

10190

896

1 天前

rerun-io / rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

visualization 机器视觉 Python Robotics Rust multimodal C++

Rust

9348

538

10 小时前

apache / seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

data-integration high-performance offline real-time apache batch cdc change-data-capture data-ingestion elt streaming embeddings 大语言模型 multimodal

Java

8807

2089

12 小时前

bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

model-serving mlops llmops generative-ai llm-inference model-inference-service inference-platform 深度学习 llm-serving 机器学习 Python multimodal ml-engineering 大语言模型 ai-inference

Python

8109

876

5 天前

enricoros / big-AGI

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ChatGPT generative-ai ui chatgpt-ui agi large-language-models stable-diffusion gpt gpt-4 openai openai-api anthropic beam gpt-5 multimodal groq mistral

TypeScript

6643

1548

20 小时前

SkalskiP / courses

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

机器视觉深度学习深度神经网络机器学习 mlops multimodal transformers 教程自然语言处理 generative-model stable-diffusion

Python

6192

562

1 年前

swyxio / ai-notes

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

人工智能 prompt-engineering stable-diffusion openai gpt gpt-3 multimodal

HTML

6054

527

20 天前