ChatGPT: Lines of code to override rules

in article Previously published at the beginning of August, four cryptography and computer security experts explained that they had become the first to compile these “messages” – or “prompts” in English. We knew they existed, but these researchers identified at least 6,387 of them, collected on four platforms, including Reddit and Discord, over a six-month period. Many of them have the ability to “unblock” a directive written into the robot’s programming, with the goal of making it perform a task that would normally be prohibited.

It should be remembered that these conversational agents have in their programming a series of rules aimed at preventing their use for illegal purposes: for example, if the user requests them Producing fraudulent emails Or pornographic material. In theory for Circumvent these rules or To “unblock” the requestyou have to start a conversation with the bot, asking it, for example, to play a role or imitate a bot that does not have this block.

That’s roughly what these four experts from the Helmholtz Center for IT Security in Germany discovered: These strategies, which they tested on five bots, including two versions of ChatGPT, succeeded in 69% of cases in getting them to perform any of 13 “prohibited activities.” “by their programmers. This is average: the most effective strategy had a success rate of 99.9%.

Prevent these “unblock” strategies It may be difficult. The researchers note that these commands appear semantically similar. They suggest that it may be possible to create a “catalog” from which an algorithm can detect suspicious “commands” as they appear. But it can also become a “cat and mouse game,” where each new updated strategy encourages hackers to become more creative.

