Embarrassing! Meta's AI Security System Easily Bypassed by 'Spaces' Attack

AIbase基地

Published inAI News · 4 min read · Jul 30, 2024

248

Recently, Meta introduced a machine learning model named Prompt-Guard-86M, designed to detect and respond to prompt injection attacks. These attacks typically involve special inputs that cause large language models (LLMs) to behave inappropriately or bypass security restrictions. However, surprisingly, this new system itself has also exposed vulnerabilities to such attacks.

Hackers, Code, Programmers

Image Source: Image generated by AI, provided by Midjourney

Prompt-Guard-86M was launched by Meta alongside their Llama3.1 generative model, primarily to assist developers in filtering out problematic prompts. Large language models often process vast amounts of text and data, and without restrictions, they might freely repeat dangerous or sensitive information. Therefore, developers have incorporated "guardrails" into the model to catch harmful inputs and outputs.

However, users of AI seem to regard bypassing these guardrails as a challenge, employing prompt injection and jailbreaking techniques to make the model ignore its own safety instructions. Recently, researchers have pointed out that Meta's Prompt-Guard-86M is vulnerable to certain special inputs. For example, when the input "Ignore previous instructions" is spaced between letters, Prompt-Guard-86M obediently disregards prior commands.

This finding was proposed by a vulnerability hunter named Aman Priyanshu, who discovered this security flaw while analyzing Meta's model and Microsoft's benchmark model. Priyanshu stated that the process of fine-tuning Prompt-Guard-86M had minimal impact on individual English letters, allowing him to design this attack method. He shared this discovery on GitHub, noting that simple character spacing and removal of punctuation could disable the classifier's detection capabilities.

Hyrum Anderson, Chief Technology Officer at Robust Intelligence, also agreed with this, stating that the success rate of such attacks is nearly 100%. Although Prompt-Guard is only one part of the defense, the exposure of this vulnerability indeed serves as a wake-up call for businesses using AI. Meta has not yet responded to this, but there are reports that they are actively seeking solutions.

Key Points:
🔍 Meta's Prompt-Guard-86M has been found to have a security vulnerability, susceptible to prompt injection attacks.
💡 Adding spaces between letters can make the system ignore safety instructions, with an attack success rate nearly reaching 100%.
⚠️ This incident reminds businesses to be cautious when using AI technology, and security issues remain a concern.

Apple AI CEO Joins Meta, Attracting Industry Attention

Apple AI executive Ruoming Pang has joined Meta, joining its newly established Super Intelligence Lab with a salary of millions of dollars. Meta recently restructured its AI business, led by former Scale AI CEO Alexandr Wang, and invested 2.9 billion dollars in Scale AI. This move highlights the increasing competition among tech giants for AI talent, as Meta accelerates its layout through high salaries and strategic investments. Apple and Meta have not commented on the matter.

Baidu AI Team Launches PaddleOCR 3.1 Version with Enhanced Capabilities Supporting MCP

On July 7, the Baidu AI team announced the official release of PaddleOCR 3.1, achieving three major upgrades in multilingual recognition, complex document translation, and large model connectivity. The new version supports text recognition in 37 languages, with an average accuracy improvement of over 30%. It also introduces a document translation pipeline and MCP server functionality to help developers efficiently build AI applications. To address multilingual needs in global scenarios, PaddleOCR 3.1 adds the PP-OCRv5 multilingual model, covering 37 languages including French, Spanish, and Russian.

Meta Unveils Proactive Chatbot That Lets AI Initiate the Conversation

Recently, Meta has been testing a new type of chatbot that will proactively send messages to users, rather than just responding after a user initiates a conversation. Imagine you're chatting with a friend on Facebook Messenger or WhatsApp, and suddenly an AI chatbot named "The Maestro of Movie Magic" sends you a message: "I hope you're having a great day! I'd like to know if you've seen any..."

Former OpenAI Researcher Reveals: Signing with Meta Did Not Bring $100 Million Bonus

Recently, a former OpenAI researcher's remarks have sparked widespread attention. He stated that although Meta claimed to offer up to $100 million in signing bonuses when poaching research talent from OpenAI, he and his colleagues did not receive this bonus. This news has undoubtedly raised questions about Meta's hiring practices. Image source note: The image was generated by AI, and the image licensing service provider is Midjourney. This researcher is named Lucas Beyer, and he and his colleague are'

Exploring the Compatibility of LLMs with Reinforcement Learning: Shanghai Jiao Tong University Reveals Differences Between Llama and Qwen, Introducing OctoThinker

Large Language Models (LLMs) have achieved significant progress in complex reasoning tasks by combining task prompts with large-scale reinforcement learning (RL), as demonstrated by models like Deepseek-R1-Zero, which directly apply reinforcement learning to base models, showcasing strong reasoning capabilities. However, this success is difficult to replicate across different base model families, especially within the Llama series. This raises a core question: what factors lead to inconsistent performance of different base models during reinforcement learning? How does reinforcement learning perform in

Meta Establishes a Superintelligence Lab to Lead a New Era in Artificial Intelligence

Meta is undergoing a major internal restructuring, deciding to consolidate all artificial intelligence-related teams into a new unit called "Meta Superintelligence Labs." This information was disclosed by Bloomberg, according to an internal memo from Meta, which shows that CEO Mark Zuckerberg hopes to focus the company's efforts on developing "superintelligence" artificial intelligence.

Meta May Deprecate Its Own Llama AI and Turn to Competitors

Recently, Meta Platforms is facing a major decision, possibly abandoning its self-developed Llama AI model and adopting artificial intelligence systems from competitors such as OpenAI and Anthropic. This change reflects a significant adjustment in Meta's open-source AI strategy and also shows the company's dissatisfaction with its own product performance. The turning point occurred at the Llama 4 launch event in April, where the product was presented at Meta's Llama

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Embarrassing! Meta's AI Security System Easily Bypassed by 'Spaces' Attack

AIbase基地

This article is from AIbase Daily

AI News Recommendations