Repository navigation

#

kv-cache

A Golang implemented Redis Server and Cluster. Go 语言实现的 Redis 服务器和分布式集群

Go
3646
1 小时前

Unified KV Cache Compression Methods for Auto-Regressive Models

Python
1010
3 个月前

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python
722
9 小时前

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

Jupyter Notebook
569
2 个月前

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python
438
9 个月前

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

276
2 个月前

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

Python
152
6 天前

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.

Cuda
143
2 天前
Python
76
6 个月前

This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.

Python
64
2 年前

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

Python
62
1 年前

Fine-Tuned Mistral 7B Persian Large Language Model LLM / Persian Mistral 7B

Jupyter Notebook
6
1 个月前

Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (RoPe), SwishGLU, RMSNorm, Mixture of Experts (MOE). Tested on Taylor Swift song lyrics dataset.

Python
3
5 个月前

SCAC strategy for efficient and effective KV cache eviction in LLMs

Python
2
1 个月前

Java-based caching solution designed to temporarily store key-value pairs with a specified time-to-live (TTL) duration.

Java
2
3 个月前