Repository navigation

rlhf

Website
Wikipedia

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

fine-tuning language-model llama 大语言模型 peft transformers rlhf qlora quantization chatglm qwen instruction-tuning mistral gpt lora large-language-models agent 人工智能 moe llama3

Python

47164

5758

3 天前

LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

ChatGPT language-model rlhf 人工智能 assistant discord-bot 机器学习 Next Python

Python

37307

3266

8 个月前

RUCAIBox / LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

chain-of-thought ChatGPT in-context-learning instruction-tuning large-language-models 大语言模型 llms 自然语言处理 pre-trained-language-models pre-training rlhf

Python

11383

882

1 个月前

ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

alpaca llama 大语言模型 llama-2 large-language-models 自然语言处理 alpaca-2 flash-attention llama2 alpaca2 Yarn rlhf

Python

7159

572

7 个月前

InternLM / InternLM

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

聊天机器人 gpt large-language-model long-context rlhf fine-tuning-llm 大语言模型 chinese flash-attention pretrained-models

Python

6868

484

2 个月前

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

大语言模型 rlhf transformers

Python

5134

440

5 个月前

argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

human-in-the-loop 自然语言处理 mlops developer-tools text-labeling annotation-tool 机器学习 active-learning weak-supervision text-annotation 大语言模型人工智能 gpt-4 rlhf langchain

Python

4460

426

5 天前

opendilab / awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

深度学习 deep-reinforcement-learning human-feedback reinforcement-learning rlhf large-language-models

3889

237

2 个月前

hiyouga / ChatGLM-Efficient-Tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

chatglm ChatGPT fine-tuning lora alpaca peft huggingface language-model transformers PyTorch rlhf chatglm2 qlora

Python

3697

475

2 年前

PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

large-language-models multimodal rlhf chameleon dpo vision-language-model

Jupyter Notebook

3405

400

4 天前

Kiln-AI / Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能 chain-of-thought collaboration fine-tuning 机器学习 macOS ollama openai prompt prompt-engineering Python rlhf synthetic-data Windows evals evaluation

Python

3390

235

14 小时前

Docta-ai / docta

A Doctor for your data

data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf

Python

3196

232

3 个月前

argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

人工智能 huggingface llms openai Python rlhf synthetic-data synthetic-dataset-generation

Python

2640

193

5 天前

transformerlab / transformerlab-app

Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.

Electron llama llms lora rlhf transformers MLX

TypeScript

2001

114

2 天前

tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

深度学习 evaluation foundation-models instruction-following large-language-models leaderboard 自然语言处理 rlhf

Jupyter Notebook

1719

266

4 个月前

THUDM / WebGLM

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

ChatGPT 大语言模型 rlhf webglm

Python

1587

139

25 天前

PKU-Alignment / safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

ai-safety alpaca datasets deepspeed large-language-models llama 大语言模型 llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf transformers vicuna safety gpt transformer beaver

Python

1448

120

10 个月前

THUDM / ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

diffusion-models generative-model rlhf

Python

1376

3 个月前

OpenLMLab / MOSS-RLHF

Secrets of RLHF in Large Language Models Part I: PPO

rlhf alignment ai-safety

Python

1356

1 年前

RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

大语言模型 rlhf llama3

Python

1296

2 个月前