Repository navigation

mixture-of-experts

Website
Wikipedia

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

深度学习 PyTorch gpu 机器学习 billion-parameters data-parallelism model-parallelism inference pipeline-parallelism compression mixture-of-experts trillion-parameters zero

Python

37996

4339

16 小时前

dvmazur / mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

colab-notebook 深度学习 google-colab language-model 大语言模型 mixture-of-experts offloading PyTorch quantization

Python

2306

233

1 年前

codelion / optillm

Python

2162

168

2 天前

learning-at-home / hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

深度学习 PyTorch volunteer-computing mixture-of-experts distributed-training distributed-systems asynchronous-programming asyncio dht neural-networks 机器学习

Python

2162

183

25 分钟前

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

large-vision-language-model mixture-of-experts moe multi-modal

Python

2148

134

5 个月前

davidmrau / mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

moe mixture-of-experts PyTorch

Python

1097

109

1 年前

rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE

mixture-of-experts multimodal vision-and-language

Jupyter Notebook

1029

3 个月前

pjlab-sys4nlp / llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

llama 大语言模型 mixture-of-experts moe

Python

956

4 个月前

microsoft / Tutel

Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4

PyTorch moe mixture-of-experts deepseek 大语言模型

Python

801

2 天前

SMTorg / smt

Surrogate Modeling Toolbox

derivative sampling mixture-of-experts predictive-modeling 机器学习

Jupyter Notebook

751

214

19 天前

lucidrains / mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

人工智能深度学习 transformer mixture-of-experts

Python

727

2 年前

drawbridge / keras-mmoe

A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

机器学习深度学习数据科学深度神经网络 Keras Tensorflow multi-task-learning mixture-of-experts

Python

710

224

2 年前

AviSoori1x / makeMoE

From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)

large-language-models 大语言模型 mixture-of-experts 深度学习 neural-networks PyTorch pytorch-implementation

Jupyter Notebook

689

6 个月前

ymcui / Chinese-Mixtral

中文Mixtral混合专家大模型（Chinese Mixtral MoE LLMs）

large-language-models 大语言模型 mixtral mixture-of-experts moe 自然语言处理

Python

604

1 年前

Leeroo-AI / mergoo

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

generative-ai 大语言模型 merge mixture-of-experts 自然语言处理 fine-tuning large-language-models lora 人工智能 transformers multi-model Open Source

Python

471

8 个月前

lucidrains / st-moe-pytorch

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

人工智能深度学习 mixture-of-experts

Python

328

10 个月前

lucidrains / soft-moe-pytorch

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

人工智能深度学习 mixture-of-experts transformers

Python

282

17 天前

Luodian / Generalizable-Mixture-of-Experts

GMoE could be the next backbone model for many kinds of generalization task.

深度学习 domain-generalization PyTorch pytorch-implementation mixture-of-experts

Python

269

2 年前

inferflow / inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

llama2 llamacpp llm-inference model-quantization multi-gpu-inference mixture-of-experts moe gemma falcon minicpm mistral bloom deepseek internlm baichuan2 mixtral qwen

C++

242

1 年前

SkyworkAI / MoH

MoH: Multi-Head Attention as Mixture-of-Head Attention

attention dit llms mixture-of-experts moe transformer vit

Python

237

6 个月前