Repository navigation

#

multimodal

Mintplex-Labs/anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

JavaScript
48096
4 小时前

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python
23331
1 年前

The Open-sourced Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.

TypeScript
17656
27 分钟前

Janus-Series: Unified Multimodal Understanding and Generation Models

Python
17511
7 个月前

AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording

TypeScript
15458
6 天前

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python
15431
15 分钟前

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, InternVL3, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).

Python
9370
14 小时前
rerun-io/rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

Rust
9077
5 小时前
Java
8724
5 天前
bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python
7996
2 天前
enricoros/big-AGI

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

TypeScript
6592
2 天前

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

Python
6149
1 年前

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

HTML
5996
2 个月前

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python
5585
4 个月前

Solve Visual Understanding with Reinforced VLMs

Python
5474
2 个月前
Python
4612
14 小时前