Repository navigation

#

mixture-of-experts

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python
39792
1 天前

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Python
2238
16 天前

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

Python
2218
1 个月前

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Python
1155
1 年前

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook
1068
7 个月前

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Python
981
8 个月前

Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 FP8/NVFP4/MXFP4

C
893
7 天前
Jupyter Notebook
794
13 小时前

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Python
792
2 年前

From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)

Jupyter Notebook
737
10 个月前

A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

Python
722
2 年前

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

Python
608
1 年前

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

Python
358
1 年前

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

Python
313
5 个月前

GMoE could be the next backbone model for many kinds of generalization task.

Python
273
2 年前

MoH: Multi-Head Attention as Mixture-of-Head Attention

Python
271
10 个月前
C++
247
1 年前