OpenAI Launches New AI Safety Approach That Actively Infers Rules to Reject Dangerous Requests!

AIbase基地

Published inAI News · 5 min read · Dec 31, 2024

225

OpenAI has introduced a new AI safety method aimed at enhancing the security of AI systems by changing how they process safety rules. This new series of models, known as the o series, no longer solely relies on learning good and bad behaviors through examples, but can understand and actively reason through specific safety guidelines.

In OpenAI's research, an example was given where when a user attempted to obtain instructions for illegal activities through encrypted text, the model successfully decoded the information but refused the request, specifically citing the safety rules that would be violated. This step-by-step reasoning process demonstrates how the model effectively adheres to relevant safety guidelines.

The training process for the o1 model is divided into three phases. First, the model learns how to provide assistance. Next, through supervised learning, the model studies specific safety guidelines. Finally, the model uses reinforcement learning to practice applying these rules, helping it truly understand and internalize these safety guidelines.

In OpenAI's tests, the newly launched o1 model significantly outperformed other mainstream systems, such as GPT-4o, Claude3.5Sonnet, and Gemini1.5Pro, in terms of safety. The tests included how the model refused harmful requests while allowing appropriate ones, and the results showed that the o1 model achieved the highest scores in both accuracy and resistance to jailbreak attempts.

OpenAI co-founder Wojciech Zaremba expressed pride in this "deliberative alignment" work on social media, believing that this reasoning model can align in a new way. He noted that ensuring systems align with human values is a significant challenge, especially in the development of Artificial General Intelligence (AGI).

Despite OpenAI claiming progress, a hacker known as "Liberator Pliny" demonstrated that even the new o1 and o1-Pro models can be manipulated to bypass safety guidelines. Pliny successfully made the model generate adult content and even share instructions for making Molotov cocktails, despite the system initially refusing these requests. These incidents highlight the difficulty of controlling these complex AI systems, as they operate based on probabilities rather than strict rules.

Zaremba stated that OpenAI has about 100 employees dedicated to AI safety and ensuring alignment with human values. He questioned the safety practices of competitors, particularly Elon Musk's xAI company, which prioritizes market growth over safety measures, while Anthropic recently launched an AI agent without proper safeguards, which Zaremba believes could lead to "significant negative feedback" for OpenAI.

Official blog: https://openai.com/index/deliberative-alignment/

Key points:
🌟 OpenAI's new o series models can actively reason through safety rules, enhancing system security.
🛡️ The o1 model outperforms other mainstream AI systems in rejecting harmful requests and accuracy.
🚨 Despite improvements, the new models can still be manipulated, and safety challenges remain severe.

OpenAI's Acquisition of Windsurf Fails, Google Successfully Poaches the CEO and Core Team

OpenAI's acquisition of Windsurf failed, and Google successfully poached its CEO and core team to join DeepMind. Windsurf, formerly known as Codeium, had raised $200 million in funding and had a valuation of $1.25 billion. Google obtained non-exclusive licensing rights to some of its technology but did not acquire the company. Recently, the popularity of code-generation startups has declined, facing competition from large models like Anthropic and Google. Windsurf will be taken over by its former executives and will continue to operate independently.

OpenAI Delays First Open-Source Large Model Release, Ensuring Safety Becomes Top Priority

OpenAI announced the postponement of its first open-source large model release, with CEO Sam Altman stating that more time is needed for safety testing and risk assessment. This new model, which has performance comparable to o3-mini, may be named 'Open Model,' but the extent of its openness remains unclear. Research Vice President Aidan Clark emphasized that the company maintains strict standards for open source, as the model cannot be recalled once released. Although the delay disappointed some users, OpenAI believes ensuring safety and taking a responsible approach is more important. This decision will shape the future of models.

OpenAI Postpones Open-Source Large Model Release, Prioritizes Safety Testing

OpenAI announced the postponement of the open-source large model release. CEO Sam Altman stated that more time is needed for safety testing. The model was originally scheduled to be released this week but is now delayed until next week to ensure its safety and reliability. Altman emphasized that once the model is released, it cannot be recalled and must be handled with caution. This is OpenAI's first attempt to release a downloadable self-running model, aimed at providing powerful tools for researchers and small businesses. Although the delay is disappointing, the community generally understands the importance of safety testing and believes it is crucial for the AI ecosystem.

OpenAI Invests $6.5 Billion to Acquire AI Company Ivy, Enters the Hardware Market!

OpenAI acquires io Products, an AI equipment company founded by Jony Ive, former design director of Apple, through a stock-only deal worth $6.5 billion, officially entering the hardware field. This acquisition brings OpenAI a top team that previously worked on the design of the iPhone. Jony Ive will be deeply involved in product design. Despite previous setbacks due to trademark disputes, this remains OpenAI's largest acquisition to date. CEO Sam Altman stated that this move will drive the integration of AI technology and hardware, and future innovative AI devices will be launched. The acquisition marks OpenAI's step into the tech scene.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

OpenAI Launches New AI Safety Approach That Actively Infers Rules to Reject Dangerous Requests!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

OpenAI's Acquisition of Windsurf Fails, Google Successfully Poaches the CEO and Core Team

Google Gemini Embedding Model Tops MTEB Ranking, Surpassing OpenAI

OpenAI Delays First Open-Source Large Model Release, Ensuring Safety Becomes Top Priority

OpenAI Postpones Open-Source Large Model Release, Prioritizes Safety Testing

SpaceX invests $2 billion to help xAI accelerate the追赶 of OpenAI

OpenAI Subtly Adds Shopify as a Search Partner, Strengthening ChatGPT Shopping Search Functionality

OpenAI Plans to Release Open-Weight Models, Breaking the Closed-Source Convention

AI API Showdown in the First Half of 2025: Gemini Dominates, DeepSeek Makes a Surprise Rise, Why Did OpenAI Fall Behind?

OpenAI Acquires AI Hardware Company Founded by Aivi, Transaction Amount Near 6.5 Billion Dollars

OpenAI Invests $6.5 Billion to Acquire AI Company Ivy, Enters the Hardware Market!