Repository navigation
multimodal
- Website
- Wikipedia
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☁️ Build multimodal AI applications with cloud-native stack
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Janus-Series: Unified Multimodal Understanding and Generation Models
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Mobile-Agent: The Powerful GUI Agent Family
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Solve Visual Understanding with Reinforced VLMs
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai