22
How easy is it to “jailbreak” a language model, i.e. to deactivate the built-in security barriers in the software? A team of researchers from the AI company Anthropic asked themselves this question. The short answer: It is surprisingly easy to elicit unwanted answers from ChatGPT, Claude, Gemini and Co. For example, instructions on how to build a bomb.