Repository navigation
ggml
- Website
- Wikipedia
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Stable Diffusion and Flux in pure C/C++
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Suno AI's Bark model in C/C++ for fast text-to-speech generation
Whisper Dart is a cross platform library for dart and flutter that allows converting audio to text / speech to text / inference from Open AI models
Run inference on MPT-30B using CPU
Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)
CLIP inference in plain C/C++ with no extra dependencies
Large Language Models for All, 🦙 Cult and More, Stay in touch !
Inference Vision Transformer (ViT) in plain C/C++ with ggml