Repository navigation

#

multimodal

Mintplex-Labs/anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

JavaScript
49661
1 天前

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python
23656
1 年前

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript
19070
4 天前

Janus-Series: Unified Multimodal Understanding and Generation Models

Python
17558
8 个月前

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python
15804
4 小时前

AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording

TypeScript
15722
1 个月前

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).

Python
10190
1 天前
rerun-io/rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

Rust
9348
10 小时前
Java
8807
12 小时前
bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python
8109
5 天前
enricoros/big-AGI

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

TypeScript
6643
20 小时前

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

Python
6192
1 年前

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

HTML
6054
20 天前

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python
5595
5 个月前

Solve Visual Understanding with Reinforced VLMs

Python
5591
1 个月前