Repository navigation

#

local-inference

High-speed Large Language Model Serving for Local Deployment

C++
8178
2 个月前

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Python
208
5 个月前
Pascal
12
3 个月前

script which performs RAG and use a local LLM for Q&A

Python
0
7 个月前

Script which takes a .wav audio file, performs speech-to-text using OpenAI/Whisper, and then, using Llama3, summarization and action point from the transcript generated

Python
0
7 个月前