Repository navigation

ai-alignment

Website
Wikipedia

emcie-co / parlant

Control GenAI interactions with power, precision, and consistency using Conversation Modeling paradigms

ai-agents genai 大语言模型 customer-service customer-success gemini llama3 openai Python ai-alignment

Python

2022

184

1 小时前

MinghuiChen43 / awesome-trustworthy-deep-learning

A curated list of trustworthy deep learning papers. Daily updating...

adversarial-machine-learning 安全隐私深度学习 poisoning fairness backdoor ownership robustness interpretable-deep-learning causality hallucinations uncertainty watermarking ai-alignment

365

2 天前

agencyenterprise / PromptInject

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

ai-safety language-models ml-safety agi ai-alignment adversarial-attacks gpt-3 large-language-models 机器学习 chain-of-thought prompt-engineering

Python

361

1 年前

tomekkorbak / pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

ai-alignment ai-safety gpt language-models pretraining reinforcement-learning rlhf

Python

180

1 年前

Giskard-AI / awesome-ai-safety

📚 A curated list of papers & technical articles on AI Quality & Safety

人工智能 ai-alignment ai-safety 大语言模型 llmops 机器学习 mlops 自然语言处理 ml-testing model-validation 机器视觉 Awesome Lists ml-safety robustness

178

6 天前

lets-make-safe-ai / make-safe-ai

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

agi 人工智能 ai-safety artificial-general-intelligence ai-alignment

168

2 年前

tsinghua-fib-lab / AAAI2025_MIA-Tuner

[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".

ai-alignment large-language-models

Python

142

1 个月前

EzgiKorkmaz / adversarial-reinforcement-learning

Reading list for adversarial perspective and robustness in deep reinforcement learning.

robust-machine-learning deep-reinforcement-learning ai-safety multiagent-reinforcement-learning ai-alignment adversarial-machine-learning responsible-ai

110

10 天前

AthenaCore / AwesomeResponsibleAI

A curated list of awesome academic research, books, code of ethics, data sets, institutes, maturity models, newsletters, principles, podcasts, reports, tools, regulations and standards related to Responsible, Trustworthy, and Human-Centered AI.

responsible-ai xai fairness-ai Awesome Lists explainable-ai interpretable-ai 人工智能 ai-alignment ai-safety

3 天前

dit7ya / awesome-ai-alignment

A curated list of awesome resources for Artificial Intelligence Alignment research

Awesome Lists ai-safety ai-alignment

2 年前

RLHFlow / Directional-Preference-Alignment

Directional Preference Alignment

rlhf ai-alignment large-language-models

7 个月前

wesg52 / sparse-probing-paper

Sparse probing paper full code.

ai-alignment ai-safety interpretability

Jupyter Notebook

1 年前

riceissa / aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

ai-safety PHP 数据库 dataset ai-alignment MySQL

HTML

8 小时前

UCSC-VLAA / Sight-Beyond-Text

[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

llama2 llava 大语言模型 mllm vicuna vision-language ai-alignment alignment vlm

Python

2 年前

lzzcd001 / nabla-gfn

Official Implementation of Nabla-GFlowNet (ICLR 2025)

ai-alignment diffusion-models generative-model finetuning

Python

12 天前

liondw / Signal-Alignment

An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal and moving the community towards finding and building solutions.

人工智能 ai-alignment design 教学

2 年前

phelps-sg / llm-cooperation

Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023

economics gpt-3 大语言模型 ai-safety ai-alignment gpt-4

Python

4 个月前