Repository navigation

#

jailbreaking

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).

Jupyter Notebook
3075
4 个月前

A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs.

Jupyter Notebook
518
18 天前

Open Source iOS 15 - iOS 15.6 Jailbreak Project

C
247
3 年前

Frida script to bypass the iOS application Jailbreak Detection

JavaScript
77
6 年前

Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]

Python
67
3 个月前
Python
37
2 年前

An extensive prompt to make a friendly persona from a chatbot-like model like ChatGPT

30
2 年前

Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.

30
3 天前

Security Kit is a lightweight framework that helps to achieve a security layer

Swift
21
2 年前

iOS APT distribution repository for rootful and rootless jailbreaks

JavaScript
16
1 个月前

During the Development of Suave7 and it's Predecessors, we've created a lot of Icons and UI-Images and we would like to share them with you. The Theme Developer Kit contains nearly 5.600 Icons, more than 380 Photoshop-Templates and 100 Pixelmator-Documents. With this Package you can customize every App from the App Store …

14
20 天前

Customizable Dark Mode Extension for iOS 13+

Logos
14
4 年前

Source code for bypass tweaks hosted under https://github.com/hekatos/repo. Licensed under 0BSD except submodules

Logos
11
3 年前

This repository contains the code for the paper "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks" by Abhinav Rao, Sachin Vashishta*, Atharva Naik*, Somak Aditya, and Monojit Choudhury, accepted at LREC-CoLING 2024

Jupyter Notebook
8
1 年前

SecurityKit is a lightweight, easy-to-use Swift library that helps protect iOS apps according to the OWASP MASVS standard, chapter v8, providing an advanced security and anti-tampering layer.

Swift
8
1 个月前

"ChatGPT Evil Confidant Mode" delves into a controversial and unethical use of AI, highlighting how specific prompts can generate harmful and malicious responses from ChatGPT.

5
10 个月前

ChatGPT Developer Mode is a jailbreak prompt introduced to perform additional modifications and customization of the OpenAI ChatGPT model.

5
10 个月前