Repository navigation

#

speculative-decoding

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python
2165
1 年前

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python
1853
1 天前

scalable and robust tree-based speculative decoding algorithm

Python
359
8 个月前

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

Python
338
5 个月前

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python
266
1 年前

REST: Retrieval-Based Speculative Decoding, NAACL 2024

C
210
24 天前

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

Python
136
5 天前

LLM Inference on consumer devices

Python
125
7 个月前

[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

Python
113
5 个月前

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Python
94
2 年前

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

Python
80
10 个月前

[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Python
55
7 个月前

[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

Python
47
5 个月前

Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)

Python
44
2 年前

Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.

Python
36
20 天前

Pretty and simple to use implementation of speculative decoding algorithm eagle which is extrapolation algorithm for greater language model efficiency 🦅

Python
36
3 个月前

Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton

Python
32
8 个月前

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

C++
30
1 年前

minimal C implementation of speculative decoding based on llama2.c

C
25
1 年前