Repository navigation

#

mixture-of-experts

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python
37996
16 小时前

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Python
2162
25 分钟前

Mixture-of-Experts for Large Vision-Language Models

Python
2148
5 个月前

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Python
1097
1 年前

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook
1029
3 个月前

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Python
956
4 个月前

Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4

Python
801
2 天前
Jupyter Notebook
751
19 天前

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Python
727
2 年前

A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

Python
710
2 年前

From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)

Jupyter Notebook
689
6 个月前

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

Python
604
1 年前

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

Python
328
10 个月前

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

Python
282
17 天前

GMoE could be the next backbone model for many kinds of generalization task.

Python
269
2 年前
C++
242
1 年前

MoH: Multi-Head Attention as Mixture-of-Head Attention

Python
237
6 个月前