Repository navigation

kvcache

Website
Wikipedia

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

inference kvcache 大语言模型 rdma sglang vllm disaggregation

C++

4049

387

2 小时前

Zefan-Cai / R-KV

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

kvcache 大语言模型

Python

1123

182

1 个月前

uccl-project / uccl

Ultra and Unified CCL

人工智能 amd broadcom CUDA gpu hpc 大语言模型 Network Nvidia rdma kvcache P2P

C++

575

3 小时前

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

kvcache 大语言模型 sglang vllm inference-engine llm-framework llm-inference llm-serving Serverless ollama

Python

2 天前

NoakLiu / PiKV

PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]

kvcache moe parallel-computing kv-cache management-system mixture-of-experts

Python

8 天前

ModelEngine-Group / unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

ascend CUDA gpu kvcache 大语言模型 npu nfs ssd torch vllm deepseek

Python

5 天前

Linking-ai / SCOPE

(ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation

kvcache long-context

Jupyter Notebook

4 个月前

IBM / spnl

Span Queries: What if we had a way to plan and optimize GenAI like we do for SQL?

generative-ai kvcache locality optimization SQL

Rust

2 小时前

RohitMurali18 / Music-Generation-Emotion-Adaptive

This project implements an Emotion-Aware Music Generator (EAMG) that turns natural-language prompts into emotion-aligned music in real time. It uses a LoRA-tuned DistilBERT to classify emotions, maps them to musical parameters using music theory, and generates MIDI via a transformer model with KV caching for low-latency output.

FastAPI kvcache 大语言模型 lora transformers

Jupyter Notebook

3 个月前