Repository navigation

#

multimodal

Mintplex-Labs/anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

JavaScript
43055
2 天前

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python
22268
8 个月前
Python
21093
2 个月前

Janus-Series: Unified Multimodal Understanding and Generation Models

Python
17135
3 个月前

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python
13672
6 分钟前

AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording

TypeScript
13481
4 天前
rerun-io/rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

Rust
8228
9 小时前
bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python
7637
3 天前

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, DeepSeek-VL2, Phi4, GOT-OCR2, ...).

Python
7041
3 天前
enricoros/big-AGI

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

TypeScript
6337
1 天前

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

Python
5966
1 年前

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking, and is fully compatible with platforms like Dify and Coze.

Python
5636
21 小时前

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

HTML
5630
2 小时前

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python
5558
13 天前

Solve Visual Understanding with Reinforced VLMs

Python
4702
2 天前

Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

Python
4487
6 个月前