Apple emphasizes that it has taken a 'responsible' approach to training its Apple Intelligence model

AIbase基地

Published inAI News · 6 min read · Jul 30, 2024

139

Apple Inc. has recently released a technical paper detailing the models developed for the generative artificial intelligence capabilities of the "Apple Intelligence" series. These features are set to roll out to iOS, macOS, and iPadOS platforms in the coming months. In the paper, Apple addresses external concerns about ethical issues in their model training process, reiterating that they have not used any private user data, instead relying on publicly available and licensed data for training.

AI, Artificial Intelligence, Robotics

Image Source Note: The image is AI-generated, with image licensing provided by Midjourney.

Apple states that their pre-training dataset includes licensed data from publishers, carefully selected public datasets, and publicly available information scraped by their web crawler, Applebot. Emphasizing the importance of user privacy, Apple highlights that these datasets do not contain any private user information.

In July, media reports surfaced that Apple had used a dataset called "The Pile," which contained captions from hundreds of thousands of YouTube videos, many of which were created without the knowledge or authorization of the caption authors. Apple later clarified that they did not intend to use these models to provide any AI features for their products.

This technical paper unveils the mystery behind Apple's "Apple Foundation Model" (AFM), which was announced at the 2024 WWDC. It emphasizes that the training data for these models was acquired "responsibly." The AFM models' training data comes from public web data and some undisclosed licensed data from publishers. It was reported that Apple contacted multiple publishers such as NBC and Condé Nast at the end of 2023, reaching long-term agreements worth at least $50 million to use their news archives for model training. Additionally, AFM models also utilized open-source code hosted on GitHub, including code in programming languages such as Swift, Python, and C.

However, using open-source code for model training has sparked controversy among developers. Some open-source code repositories lack proper licensing or do not permit use for AI training. Apple asserts that they undergo a "licensing filter," selecting only those repositories with fewer restrictions.

To enhance the mathematical capabilities of the AFM models, Apple specifically included mathematical problems and answers from web pages, math forums, blogs, tutorials, and seminars in their training dataset. They also used "high-quality, publicly available" datasets for fine-tuning to minimize the likelihood of the models exhibiting inappropriate behavior.

The integrated dataset contains approximately 6.3 trillion tokens, compared to the 15 trillion tokens used by Meta for training its flagship text generation model, Llama3.1405B. Apple further optimized the AFM models through human feedback and synthetic data to better align with user needs.

Although the paper does not present any groundbreaking discoveries, this is a deliberate outcome. Most such papers avoid excessive detail to sidestep legal issues. Apple mentions in the paper that they allow web administrators to block crawlers from scraping data, but this is not particularly helpful for individual creators, leaving the protection of their work as an unresolved issue.

Key Points:

🌟 Apple emphasizes that they did not use private user data in training models, relying instead on public and licensed data.

📊 The training data includes authorized content from multiple publishers and open-source code repositories.

🔍 Apple strives to enhance AI model performance and accountability while protecting user privacy.

ChatGPT iOS App Monthly Downloads Exceed 30 Million, Surpassing All Social Apps

The iOS app of ChatGPT had 29.6 million downloads in the past 28 days, becoming the most popular app globally. This achievement made ChatGPT's download count exceed the combined total of four social apps - TikTok, Facebook, Instagram, and X - which had approximately 32.9 million downloads during the same period, creating a difference of 10.6%. Although social apps have been on the market longer, ChatGPT achieved this in a short period of time.

Apple AI Model Update: Device-Side Strength Gradually Approaching Competitors, but Server-Side Performance Falls Short

Apple has released the latest update to its artificial intelligence model, which primarily supports the Apple Intelligence feature for systems like iOS and macOS. According to Apple's official data, the newly launched model performs comparably with similar products from Google and Alibaba, but compared to OpenAI's GPT-4o released a year ago, Apple's server-side model performance is notably weaker. In the update, Apple emphasized the capabilities of its 'device-side model'.

Is the iPhone Myth Shaking? Apple Struggling in the AI Race, WWDC May Be a Key Turning Point

Against the global backdrop of accelerated development in generative artificial intelligence, Apple is facing a trust crisis. Despite its high-profile announcement a year ago to bring a series of AI features to the iPhone, particularly the smart upgrade for Siri, most of these "Apple Intelligence" promises have yet to be fulfilled. While competitors like Google, OpenAI, and Samsung are successively releasing AI phones and assistant functions, Apple's silence seems increasingly passive. Apple plans to hold its annual worldwide developers conference this Monday at Silicon Valley.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Apple emphasizes that it has taken a 'responsible' approach to training its Apple Intelligence model

AIbase基地

This article is from AIbase Daily

AI News Recommendations

ChatGPT iOS App Monthly Downloads Exceed 30 Million, Surpassing All Social Apps

Apple iOS 26 Update: TuLan Fun Features Are Completely New! AI-Generated Images Are More Realistic!

Deep Dive into Speechly: How Does the Voice-to-Email Tool Enhance Work Efficiency?

Apple Utilizes AI Tags to Enhance App Store Discoverability; iOS 26 Developer Beta is Now Available

Apple AI Model Update: Device-Side Strength Gradually Approaching Competitors, but Server-Side Performance Falls Short

Apple WWDC 2025: iOS 26 Upgrade Visual Intelligence AI Assists Screen Content Recognition

Is the iPhone Myth Shaking? Apple Struggling in the AI Race, WWDC May Be a Key Turning Point

Snap推出Lens Studio for iOS和网络应用程序，简化AR镜头创建流程

Google Gemini Live Function Officially Lands on iOS Platform, Opening a New AI Recognition Experience

Baidu Xinxiang iOS Version Officially Launched, Comprehensive Coverage of Intelligent Body Applications