Apple's AI/ML Team Partners with Columbia University to Successfully Overcome Google's CAPTCHAs

站长之家

Published inAI News · 2 min read · Oct 12, 2023

Data to be translated: The Apple AI/ML team, in collaboration with Columbia University, has developed a multimodal large model named "Ferret" that has successfully challenged Google's human-machine captcha. Ferret can recognize traffic lights and enhance the accuracy of large models in tasks that involve "seeing, speaking, and answering." The innovation of Ferret lies in its ability to integrate spatial understanding of references and positioning, simultaneously comprehending semantics and objectives, which is different from traditional multimodal models. By using a mixed-region representation method that combines discrete coordinates and continuous features, the model performs exceptionally well in multi-task evaluations, particularly in tasks involving reference and visual grounding. This breakthrough was achieved by a Chinese team, highlighting China's strength in the research of multimodal large models and providing new directions for image understanding and multimodal tasks. The achievements of Ferret are expected to make significant breakthroughs in areas such as human-computer interaction and intelligent search.

Large Models AI Headlines

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

China's AI IaaS Market Size Reached 19.87 Billion Yuan in the First Half of 2025, Growing by 122.4% Year-over-Year

According to IDC reports, China's AI IaaS market size reached 19.87 billion yuan in the first half of 2025, increasing by 122.4% year-over-year. Among them, generative AI IaaS performed outstandingly, with a scale of 16.68 billion yuan, growing by 219.3%; the other AI IaaS market shrunk by 14.1% to 3.19 billion yuan. Cloud service providers continue to strengthen their investment in AI infrastructure.

Oct 21, 2025

Vidu Q2 Fully Upgraded: Reference Video Speed Increased by Three Times, Supports AI Story Creation Up to Five Minutes

Vidu Q2's global video generation launch for professionals enhances semantic understanding, camera control, and content consistency, available on web and app for high-quality creative needs.....

Oct 21, 2025

Andrej Karpathy comments on the DeepSeek-OCR paper: Image input may become a new direction for large language models

Ex-Tesla exec Andrej Karpathy suggests images may surpass text as superior LLM inputs, criticizing current token inefficiency and sparking AI community discussion on visual input research.....

Oct 21, 2025

Huawei Launches Global AI Talent Recruitment, Focused on Building a Top AI Team

Huawei has launched a global AI talent recruitment program aimed at forming a top team to promote the development of large models and general artificial intelligence (AGI). After being announced through its official Weibo account, it has attracted attention. Yu Chengdong stated that he welcomes young talents to join. The recruitment requirements include academic excellence, technical enthusiasm, and innovative thinking.

Oct 21, 2025

110

Microsoft Edge New Strategy: Promoting Copilot When Accessing AI Applications

Microsoft Edge browser has launched a new strategy, where a pop-up label appears on the right side of the address bar when users visit AI applications such as ChatGPT, DeepSeek, or Perplexity. This label reminds users to try Microsoft Copilot. Clicking the label opens Copilot in the sidebar, making it easy to ask questions or upload files. The move aims to attract users to use Microsoft's AI tools.

Oct 21, 2025

100

Unlock the Global AI Capabilities with a Single Key — We Are Building a Future-Oriented Global AI Aggregation Gateway

AI advances like ChatGPT and multimodal models boost productivity, but face challenges: inconsistent APIs, high costs, data security, and lack of unified model management, hindering large-scale adoption.....

Oct 21, 2025

120

Microsoft Announces the End of the Win10 Era, Windows 11 Opens an AI-Driven Future of Work

Microsoft ends Windows 10 support, shifting to AI-integrated Windows 11 to drive upgrades and pioneer future work.....

Oct 21, 2025

Fish Audio Launches Upgraded S1 Voice Cloning Model: Clone Real Human Speech in 10 Seconds

Fish Audio released an upgraded version of the S1 voice cloning model, achieving breakthroughs in emotional expressiveness and realism. The model can generate realistic human-like voices with emotions, rhythm, and tone variations. It can clone a human voice with just 10 seconds of audio sample, fully preserving the original voice's accent, intonation, rhythm, and speaking habits, producing highly realistic results.

Oct 21, 2025

110

Breaking the Bottleneck! Shanghai Jiao Tong University and Shanghai AI Lab Collaborate to Enhance the Reflective Ability of Multimodal Large Models

Shanghai Jiao Tong University and Shanghai AI Lab launch MM-HELIX to enhance multimodal models' reflection in complex reasoning, mimicking human-like iterative thinking for flexible AI problem-solving.....

Oct 21, 2025

140

Anthropic Launches Claude for Life Sciences: AI Accelerates Life Science Research

Anthropic launches 'Claude for Life Sciences', an AI tool designed to accelerate drug discovery and medical innovation by integrating lab processes, marking its first vertical industry expansion.....

Oct 21, 2025

140

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Apple's AI/ML Team Partners with Columbia University to Successfully Overcome Google's CAPTCHAs

站长之家

This article is from AIbase Daily

AI News Recommendations

China's AI IaaS Market Size Reached 19.87 Billion Yuan in the First Half of 2025, Growing by 122.4% Year-over-Year

Vidu Q2 Fully Upgraded: Reference Video Speed Increased by Three Times, Supports AI Story Creation Up to Five Minutes

Andrej Karpathy comments on the DeepSeek-OCR paper: Image input may become a new direction for large language models

Huawei Launches Global AI Talent Recruitment, Focused on Building a Top AI Team

Microsoft Edge New Strategy: Promoting Copilot When Accessing AI Applications

Unlock the Global AI Capabilities with a Single Key — We Are Building a Future-Oriented Global AI Aggregation Gateway

Microsoft Announces the End of the Win10 Era, Windows 11 Opens an AI-Driven Future of Work

Fish Audio Launches Upgraded S1 Voice Cloning Model: Clone Real Human Speech in 10 Seconds

Breaking the Bottleneck! Shanghai Jiao Tong University and Shanghai AI Lab Collaborate to Enhance the Reflective Ability of Multimodal Large Models

Anthropic Launches Claude for Life Sciences: AI Accelerates Life Science Research

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Apple's AI/ML Team Partners with Columbia University to Successfully Overcome Google's CAPTCHAs

站长之家

This article is from AIbase Daily

AI News Recommendations

China's AI IaaS Market Size Reached 19.87 Billion Yuan in the First Half of 2025, Growing by 122.4% Year-over-Year

Vidu Q2 Fully Upgraded: Reference Video Speed Increased by Three Times, Supports AI Story Creation Up to Five Minutes

Andrej Karpathy comments on the DeepSeek-OCR paper: Image input may become a new direction for large language models

Huawei Launches Global AI Talent Recruitment, Focused on Building a Top AI Team

Microsoft Edge New Strategy: Promoting Copilot When Accessing AI Applications

Unlock the Global AI Capabilities with a Single Key — We Are Building a Future-Oriented Global AI Aggregation Gateway

Microsoft Announces the End of the Win10 Era, Windows 11 Opens an AI-Driven Future of Work

Fish Audio Launches Upgraded S1 Voice Cloning Model: Clone Real Human Speech in 10 Seconds

Breaking the Bottleneck! Shanghai Jiao Tong University and Shanghai AI Lab Collaborate to Enhance the Reflective Ability of Multimodal Large Models

Anthropic Launches Claude for Life Sciences: AI Accelerates Life Science Research

GEO Services