A newly discovered hacking technique can cause chatbots such as ChatGPT or Gemini to violate policies and provide a lot of dangerous information.
Skeleton Key helps hackers order chatbots to perform dangerous behaviors. Photo: AI in Asia.
Mark Russinovich, chief architect of Microsoft Azure, has warned about an extremely dangerous hacking technique that could allow users to exploit security vulnerabilities to force large language models to reveal dangerous information.
By bypassing the protections, Skeleton Key allows users to command large language models to perform malicious and dangerous behaviors,” Microsoft Azure’s chief technology officer wrote in a blog post.
Currently, Skeleton Key has been found to be effective against several popular AI chatbots such as OpenAI’s ChatGPT, Gemini (Google), and Claude (Anthropic).
Instead of trying to completely change the principles of the AI model, the Skeleton Key miners use commands to sabotage its behavior.
As a result, instead of rejecting the request as programmed, the chatbot will issue warnings about harmful content. The attacker will then trick the chatbot into creating an offensive, harmful, or even illegal outcome.
An example given in Microsoft’s post is a query that asks for instructions on how to build a rudimentary gasoline bomb.
Initially, the chatbot refused and warned that it was programmed to be “safe and useful.” However, the user answering this query is intended to educate and suggest the chatbot to update the behavior to provide information with a warning prefix.
Immediately, the chatbot was fooled and gave instructions for building the bomb, which violated the principles originally programmed.
Microsoft immediately released a number of software updates to minimize the impact of Skeleton Key on large language models on the platform, including the AI assistant Copilot.