Repository navigation

#

human-feedback

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

Python
7854
17 天前

A curated list of reinforcement learning with human feedback resources (continually updated)

4100
1 个月前

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

Python
473
1 年前

Let's build better datasets, together!

Jupyter Notebook
261
8 个月前

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

Python
236
1 年前

The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

Python
177
8 个月前

Implementation of Reinforcement Learning from Human Feedback (RLHF)

Jupyter Notebook
172
2 年前

BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

Makefile
154
2 年前

The Prism Alignment Project

Jupyter Notebook
79
1 年前

[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback

Python
56
9 个月前

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

Python
47
1 年前

Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".

Python
39
9 个月前

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

Python
37
1 年前

[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"

Python
19
2 年前
Python
10
5 个月前

Reinforcement Learning from Human Feedback with 🤗 TRL

Python
9
2 年前

Break out of the AI training bubble

Python
6
1 个月前

REactive Behavior Constraint-Aware Tree learning (REBCAT) - a human-robot collaboration framework to learn task from demonstrations. Interpretable, fast, object-centric, and reactive.

Python
2
3 个月前

Search Engine Optimization using Human Implicit Feedback

JavaScript
1
2 年前