Apple and Columbia University Join Forces to Develop the Ferret Multimodal Language Model

站长之家

Published inAI News · 1 min read · Oct 30, 2023

Translated data: Researchers from Apple and Columbia University have collaborated to develop the Ferret multimodal language model, designed to achieve advanced image understanding and description. This model boasts robust global comprehension capabilities, capable of simultaneously processing free text and referenced regions, outperforming traditional models. The researchers created the GRIT dataset to guide model training and evaluate Ferret's performance across multiple tasks, demonstrating its capabilities in referencing and localization, with significant potential breakthroughs anticipated in areas such as human-computer interaction and intelligent search.

Ferret MLLM multimodal language model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Release of the New Generation AI Video Generation Model LTX-2: One-Click Generation of High-Quality Narrative Videos

Lightricks' LTX-2 AI model generates 20-second 4K narrative videos with synchronized visuals, audio, and lip-sync in a single diffusion process, enhancing video creation efficiency.....

Oct 31, 2025

110

Blind People Can Also See Street Scenes? Google's New AI System Makes Virtual Exploration Accessible, Marking a Key Step in Technology for Good

Google has launched the StreetReaderAI prototype system, helping blind and low-vision users to independently explore Google Street View through natural language interaction. The system integrates computer vision, geographic information systems, and large language models, enabling a multimodal AI-driven real-time conversational street view experience, breaking through the limitations of traditional voice announcements and enhancing the freedom of accessible urban exploration.

Oct 31, 2025

China Academy of Information and Communications Technology's Artificial Intelligence Institute Jointly Released 'Research Report on the Application of Large Model Integrated Machines (2025)'

China Academy of Information and Communications Technology and the Artificial Intelligence Industry Development Alliance released 'Research Report on the Application of Large Model Integrated Machines (2025)', analyzing technical evolution, industry dynamics, and application practices, providing enterprises with comprehensive references. The report outlines the development history of large model integrated machines, highlights significant progress in recent years, and focuses on changes at the technical level.

Oct 31, 2025

100

Canva Launches a New Creative Operating System, Fully Upgrading Digital Marketing Tools

Canva launches new digital marketing and video editing tools based on the world's first 'Design AI Model', upgrading its visual suite products, positioning them as a creative operating system for marketing teams. This term does not refer to a traditional operating system, but rather a comprehensive system integrating task tools, AI support, and platform interface.

Oct 31, 2025

OpenAI Launches Aardvark: An Intelligent Security Research Assistant to Enhance Software Protection

OpenAI has launched Aardvark, an intelligent security assistant based on GPT-5, to help developers and security teams efficiently address the challenge of thousands of new vulnerabilities each year. The tool continuously analyzes source code, automatically identifies vulnerabilities, assesses risks, prioritizes them, and provides remediation solutions, significantly improving the efficiency of software security protection.

Oct 31, 2025

OpenAI launches gpt-oss-safeguard: an open-source AI safety model that can be updated in real time

OpenAI releases the open-source safety model gpt-oss-safeguard, providing a flexible and transparent AI safety classification solution. This kit includes dual versions of 120 and 20, and uses the Apache 2.0 open source license, supporting free modification and integration. It innovatively realizes real-time policy interpretation functionality, which can adapt to changes in security rules without retraining, significantly reducing system maintenance costs and response latency.

Oct 31, 2025

Meta Researchers Uncover the Black Box of Large Language Models and Fix AI Reasoning Flaws

Meta and Edinburgh University develop CRV technology to analyze LLM reasoning circuits, predict correctness, and fix errors, enhancing AI reliability via activation computation graphs.....

Oct 31, 2025

Zhiyuan Launches Emu3.5 Large Model: Reconstructing Multimodal Intelligence with Next-State Prediction, Embodied Operational Capabilities Amaze the Industry

Emu3.5 introduces autoregressive next-state prediction, enabling AI to plan and execute cross-modal tasks in complex environments, advancing from perception to intelligent operation.....

Oct 30, 2025

190

Wikipedia Stands Up to Musk! GrokiPedia Launches First Day Under Attack by the Human Knowledge Declaration: We Don't Trust AI, Only Humans

Wikipedia responds to Musk's AI encyclopedia challenge, emphasizing its 25-year non-profit model built by global volunteers, advocating that knowledge is created by humans, not machines, subtly criticizing the commercial tendencies of tech giants.

Oct 30, 2025

8B Model Outperforms 32B? Mira Murati's New Work in Online Strategic Distillation Sparks an AI Training Revolution, Cost Drops by 90%!

Mira Murati's team introduced online policy distillation, enabling an 8B-parameter model to achieve 70% of a 32B model's performance with 90% lower training costs and 50-100x efficiency gains, making high-performance AI accessible to small developers.....

Oct 30, 2025

140

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Apple and Columbia University Join Forces to Develop the Ferret Multimodal Language Model

站长之家

This article is from AIbase Daily

AI News Recommendations

Release of the New Generation AI Video Generation Model LTX-2: One-Click Generation of High-Quality Narrative Videos

Blind People Can Also See Street Scenes? Google's New AI System Makes Virtual Exploration Accessible, Marking a Key Step in Technology for Good

China Academy of Information and Communications Technology's Artificial Intelligence Institute Jointly Released 'Research Report on the Application of Large Model Integrated Machines (2025)'

Canva Launches a New Creative Operating System, Fully Upgrading Digital Marketing Tools

OpenAI Launches Aardvark: An Intelligent Security Research Assistant to Enhance Software Protection

OpenAI launches gpt-oss-safeguard: an open-source AI safety model that can be updated in real time

Meta Researchers Uncover the Black Box of Large Language Models and Fix AI Reasoning Flaws

Zhiyuan Launches Emu3.5 Large Model: Reconstructing Multimodal Intelligence with Next-State Prediction, Embodied Operational Capabilities Amaze the Industry

Wikipedia Stands Up to Musk! GrokiPedia Launches First Day Under Attack by the Human Knowledge Declaration: We Don't Trust AI, Only Humans

8B Model Outperforms 32B? Mira Murati's New Work in Online Strategic Distillation Sparks an AI Training Revolution, Cost Drops by 90%!

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Apple and Columbia University Join Forces to Develop the Ferret Multimodal Language Model

站长之家

This article is from AIbase Daily

AI News Recommendations

Release of the New Generation AI Video Generation Model LTX-2: One-Click Generation of High-Quality Narrative Videos

Blind People Can Also See Street Scenes? Google's New AI System Makes Virtual Exploration Accessible, Marking a Key Step in Technology for Good

China Academy of Information and Communications Technology's Artificial Intelligence Institute Jointly Released 'Research Report on the Application of Large Model Integrated Machines (2025)'

Canva Launches a New Creative Operating System, Fully Upgrading Digital Marketing Tools

OpenAI Launches Aardvark: An Intelligent Security Research Assistant to Enhance Software Protection

OpenAI launches gpt-oss-safeguard: an open-source AI safety model that can be updated in real time

Meta Researchers Uncover the Black Box of Large Language Models and Fix AI Reasoning Flaws

Zhiyuan Launches Emu3.5 Large Model: Reconstructing Multimodal Intelligence with Next-State Prediction, Embodied Operational Capabilities Amaze the Industry

Wikipedia Stands Up to Musk! GrokiPedia Launches First Day Under Attack by the Human Knowledge Declaration: We Don't Trust AI, Only Humans

8B Model Outperforms 32B? Mira Murati's New Work in Online Strategic Distillation Sparks an AI Training Revolution, Cost Drops by 90%!

GEO Services