Repository navigation

#

ai-alignment

emcie-co/parlant

Control GenAI interactions with power, precision, and consistency using Conversation Modeling paradigms

Python
2022
1 小时前

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

Python
361
1 年前

Code accompanying the paper Pretraining Language Models with Human Preferences

Python
180
1 年前

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

168
2 年前

[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".

Python
142
1 个月前

A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podcasts, reports, tools, regulations and standards related to Responsible, Trustworthy, and Human-Centered AI.

71
3 天前

A curated list of awesome resources for Artificial Intelligence Alignment research

70
2 年前

Sparse probing paper full code.

Jupyter Notebook
55
1 年前

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

HTML
21
8 小时前

[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

Python
19
2 年前

Official Implementation of Nabla-GFlowNet (ICLR 2025)

Python
19
12 天前

An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal and moving the community towards finding and building solutions.

18
2 年前

Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023

Python
12
4 个月前

Scan your AI/ML models for problems before you put them into production.

Python
11
20 天前
JavaScript
10
9 天前

IDA with RL and overseer failures

TeX
8
4 年前