Since 2021, Microsoft's AI security team has tested over 100 generative AI products to identify vulnerabilities and ethical issues. Their findings challenge some common assumptions about AI safety and emphasize the ongoing importance of human expertise.
It turns out that the most effective attacks are not always the most complex. A study cited in Microsoft's report states: "Real hackers don't calculate gradients; they use quick engineering." This study compared AI security research with real-world practices. In one test, the team successfully bypassed the security features of an image generator simply by hiding harmful instructions within the text of an image—without the need for complex mathematical calculations.
While Microsoft has developed PyRIT, an open-source tool for automated security testing, the team emphasizes that human judgment cannot be replaced. This became particularly evident when they tested how chatbots handle sensitive situations, such as conversations with emotionally distressed individuals. Evaluating these scenarios requires both psychological expertise and a deep understanding of potential mental health impacts.
In investigating AI bias, the team also relied on human insights. In one example, they examined gender bias in image generators by creating images of various professions without specifying gender.
New Security Challenges Arise
The integration of AI into everyday applications has introduced new vulnerabilities. In one test, the team successfully manipulated a language model to create convincing fraudulent scenarios. When combined with text-to-speech technology, this creates a system that can interact with people in dangerously realistic ways.
The risks are not limited to issues unique to AI. The team discovered a traditional security vulnerability (SSRF) in an AI video processing tool, indicating that these systems face both new and old security challenges.
Ongoing Security Needs
This research particularly focuses on the risks of "responsible AI," which refers to situations where AI systems may generate harmful or ethically problematic content. These issues are especially challenging to address as they often heavily depend on context and individual interpretation.
The Microsoft team found that unintentional exposure of ordinary users to problematic content is more concerning than deliberate attacks, as this indicates that security measures are not functioning as intended during normal use.
The findings clearly show that AI security is not a one-time fix. Microsoft recommends ongoing identification and remediation of vulnerabilities, followed by further testing. They suggest that this requires support from regulations and financial incentives to make successful attacks more costly.
The research team stated that several key questions remain to be addressed: How do we identify and control AI capabilities that pose potential dangers, such as persuasion and deception? How do we adjust security testing based on different languages and cultures? How can companies share their methods and results in a standardized way?