Repository navigation

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

ai-safety language-models ml-safety agi ai-alignment adversarial-attacks gpt-3 large-language-models 机器学习 chain-of-thought prompt-engineering

Python

404

1 年前

hendrycks / ethics

Aligning AI With Shared Human Values (ICLR 2021)

ai-safety gpt-3 ml-safety

Python

296

2 年前

hendrycks / imagenet-r

ImageNet-R(endition) and DeepAugment (ICCV 2021)

robustness domain-adaptation domain-generalization ml-safety

Python

270

4 年前

hendrycks / ss-ood

Self-Supervised Learning for OOD Detection (NeurIPS 2019)

robustness out-of-distribution-detection uncertainty self-supervised-learning self-supervised ml-safety

Python

267

4 年前

jiachens / ModelNet40-C

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

深度学习 robustness 机器视觉 benchmark data-augmentation regularization ml-safety PyTorch

Python

209

2 年前

Giskard-AI / awesome-ai-safety

📚 A curated list of papers & technical articles on AI Quality & Safety

人工智能 ai-alignment ai-safety 大语言模型 llmops 机器学习 mlops 自然语言处理 ml-testing model-validation 机器视觉 Awesome Lists ml-safety robustness

190

4 个月前

hendrycks / anomaly-seg

The Combined Anomalous Object Segmentation (CAOS) Benchmark

segmentation carla-simulator anomaly-detection out-of-distribution-detection ml-safety

Python

159

3 年前

hendrycks / pre-training

Pre-Training Buys Better Robustness and Uncertainty Estimates (ICML 2019)

pretrained robustness uncertainty out-of-distribution-detection calibration adversarial-examples ml-safety

Python

100

3 年前

YyzHarry / ME-Net

[ICML 2019] ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation

adversarial-attacks adversarial-example robustness defense ml-safety icml icml-2019

Python

3 个月前

hendrycks / jiminy-cricket

Jiminy Cricket Environment (NeurIPS 2021)

ml-safety

ZAP

4 年前

yaodongyu / ProjNorm

Predicting Out-of-Distribution Error with the Projection Norm

ml-safety

Python

3 年前

moonwatcher-ai / moonwatcher

Evaluation & testing framework for computer vision models

ai-safety ai-security ethical-artificial-intelligence ml-safety ml-testing ml-validation mlops 机器视觉

Python

1 年前

Doleus / doleus

Build confidence in your AI with systematic slice-based testing

fairness 机器学习 mlops Python PyTorch quality-assurance quality-control slice ai-safety ai-security 机器视觉 ethical-artificial-intelligence ml-safety ml-testing ml-validation

Python

2 个月前

harsmac / MUFIACode

Code for the attack multiplicative filter attack MUFIA, from the paper "Frequency-based vulnerability analysis of deep learning models against image corruptions".

adversarial-attacks adversarial-examples adversarial-machine-learning cifar10 cifar100 机器视觉 domain-generalization filters imagenet 机器学习 ml-safety PyTorch robustness

Python

2 年前