Recently, a developer successfully manipulated Apple's new AI system, Apple Intelligence, in MacOS 15.1 Beta 1 using an attack method known as "prompt injection." This allowed the AI to easily bypass its original commands and respond to arbitrary prompts. This incident has garnered significant attention in the industry.

image.png

Developer Evan Zhou demonstrated the exploitation of this vulnerability on YouTube. His initial goal was to manipulate the "rewrite" function of Apple Intelligence, which is typically used to revise and enhance text quality. However, Zhou's first attempt with the "ignore previous commands" command did not work. Surprisingly, he then discovered the template and special markers for Apple Intelligence's prompts, which separate the AI's system role from the user role, through information shared by a Reddit user.

Using this information, Zhou successfully constructed a prompt that could override the original system prompt. He prematurely ended the user role, inserted a new system prompt, instructing the AI to ignore previous commands and respond to subsequent text. After several attempts, the attack succeeded! Apple Intelligence not only responded to Zhou's commands but also provided information he had not asked for, proving the effectiveness of prompt injection.

 Evan Zhou also published his code on GitHub. It is worth noting that although "prompt injection" attacks are not new in AI systems, the issue has been known since the release of GPT-3 in 2020, but it has not been completely resolved. Apple deserves some credit, as compared to other chat systems, Apple Intelligence is more complex in preventing prompt injection. For example, many chat systems can be easily deceived by directly inputting text in the chat window or through hidden text in images. Even systems like ChatGPT or Claude can still be vulnerable to prompt injection attacks in certain situations.

Key Points:

🌟 Developer Evan Zhou successfully manipulated Apple's AI system using "prompt injection," causing it to ignore its original commands.  

🔍 Zhou constructed an attack method capable of overriding system prompts using information shared by a Reddit user.  

🛡️ Although Apple's AI system is relatively more complex, the "prompt injection" issue remains unresolved and continues to be a focus of the industry.