Repository navigation

distributed-training

Website
Wikipedia

GokuMohandas / Made-With-ML

Learn how to design, develop, deploy and iterate on production-grade ML applications.

机器学习深度学习 PyTorch 自然语言处理数据科学 Python mlops data-engineering data-quality 大语言模型 ray distributed-training

Jupyter Notebook

43386

6758

1 年前

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Python

35405

5025

1 天前

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

paddlepaddle 深度学习 scalability 机器学习神经网络 Python efficiency distributed-training

C++

23261

5836

5 天前

PaddlePaddle / PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

自然语言处理 embedding bert ernie paddlenlp pretrained-models transformers information-extraction question-answering search-engine semantic-analysis sentiment-analysis neural-search uie document-intelligence compression 大语言模型 distributed-training llama

Python

12787

3075

5 天前

Netflix / metaflow

Build, Manage and Deploy AI/ML Systems

机器学习 model-management 人工智能 ml-platform ml-infrastructure Python mlops datascience high-performance-computing Kubernetes Amazon Web Services Azure Google 云大语言模型 llmops agents generative-ai cost-optimization distributed-training

Python

9548

989

8 小时前

skypilot-org / skypilot

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 17+ clouds, or on-prem).

Python

8793

798

4 小时前

IDEA-CCNL / Fengshenbang-LM

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

chinese-nlp pretrained-models PyTorch distributed-training transformers aigc multimodal

Python

4142

384

1 年前

FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

federated-learning 深度学习 distributed-training edge-ai 机器学习 on-device-training inference-engine mlops model-deployment model-serving ai-agent

Python

3939

763

2 个月前

bytedance / byteps

A high performance and generic framework for distributed DNN training

机器学习深度学习 distributed-training Tensorflow mxnet Keras PyTorch

Python

3704

494

2 年前

tensorflow / adanet

Fast and flexible AutoML with learning guarantees.

automl Tensorflow learning-theory 深度学习 neural-architecture-search gpu 机器学习 ensemble tpu Python distributed-training

Jupyter Notebook

3459

531

2 年前

determined-ai / determined

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.

深度学习机器学习 ml-platform ml-infrastructure hyperparameter-optimization hyperparameter-search distributed-training PyTorch Tensorflow hyperparameter-tuning Kubernetes 数据科学 mlops Keras

3185

369

6 个月前