Repository navigation

human-feedback

Website
Wikipedia

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

人工智能 attention-mechanisms 深度学习 reinforcement-learning transformers human-feedback

Python

7854

678

17 天前

opendilab / awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

深度学习 deep-reinforcement-learning human-feedback reinforcement-learning rlhf large-language-models

4100

246

1 个月前

conceptofmind / LaMDA-rlhf-pytorch

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

attention-mechanism 深度学习机器学习人工智能 human-feedback reinforcement-learning transformers

Python

473

1 年前

huggingface / data-is-better-together

Let's build better datasets, together!

community datasets human-feedback 机器学习

Jupyter Notebook

261

8 个月前

yk7333 / d3po

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

diffusion-models human-feedback reinforcement-learning

Python

236

1 年前

wxjiao / ParroT

The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

ChatGPT gpt-4 llama machine-translation human-feedback instruction-tuning lora

Python

177

8 个月前

xrsrke / instructGOOSE

Implementation of Reinforcement Learning from Human Feedback (RLHF)

reinforcement-learning rlhf ChatGPT human-feedback

Jupyter Notebook

172

2 年前

trubrics / trubrics-python

Product analytics for AI Assistants

机器学习 ml-monitoring mlops human-feedback 大语言模型 llmops Streamlit

Python

154

3 个月前

PKU-Alignment / beavertails

BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

ai-safety human-feedback language-model 大语言模型 rlhf safety beaver datasets gpt llama

Makefile

154

2 年前

HannahKirk / prism-alignment

The Prism Alignment Project

alignment dataset human-feedback

Jupyter Notebook

1 年前

JD-GenX / Reliable_AD

[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback

advertising diffusers diffusion diffusion-models eccv2024 human-feedback image-generation rlhf datasets

Python

9 个月前

davidberenstein1957 / dataset-viber

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

data-collection data-quality evaluation human-feedback

Python

1 年前

gao-g / prelude

Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".

alignment gpt4 human-feedback interpretability 大语言模型 transformers

Python

9 个月前

ZiyiZhang27 / tdpo

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

alignment diffusion-models human-feedback reinforcement-learning rlhf text-to-image stable-diffusion

Python

1 年前

AlaaLab / pathologist-in-the-loop

[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"

human-feedback rlhf synthetic-data

Python

2 年前

wang8740 / MAP

Documentation at

finetuning human-feedback 大语言模型 rlhf

Python

5 个月前

victor-iyi / rlhf-trl

Reinforcement Learning from Human Feedback with 🤗 TRL

human-feedback reinforcment-learning rlhf

Python

2 年前

RapidataAI / crowd-eval

Break out of the AI training bubble

human-feedback 机器学习 wandb

Python

1 个月前

CogniSeeker / REBCAT

REactive Behavior Constraint-Aware Tree learning (REBCAT) - a human-robot collaboration framework to learn task from demonstrations. Interpretable, fast, object-centric, and reactive.

behavior-trees decision-tree-classifier human-feedback interpretable-ai

Python

3 个月前

JacqueWill / SEO_HIF_JS

Search Engine Optimization using Human Implicit Feedback

data-privacy edge-computing human-feedback 机器学习 seo-optimization

JavaScript

2 年前