At the recent Gartner IT Symposium, analysts shared a striking prediction: by 2027, 40% of generative AI (GenAI) solutions will integrate multimodal capabilities, able to process text, images, audio, and video simultaneously. This represents a significant leap from the 1% in 2023. This transformation will have profound implications for enterprise applications.
Image source note: The image was generated by AI, provided by the image licensing service Midjourney
Erick Brethenoux, Senior Vice President at Gartner, noted that as the GenAI market evolves towards multimodal models, this will help capture relationships between different data streams and may extend the benefits of GenAI across various data and applications. He emphasized that multimodal GenAI can support humans in performing more tasks in different environments.
According to the 2024 Gartner Hype Cycle for Generative AI Technologies report, multimodal GenAI and open-source large language models (LLM) are considered highly impactful, expected to bring significant competitive advantages and market responsiveness to enterprises within the next five years. Gartner also pointed out that within the next decade, domain-specific GenAI models and autonomous agents are expected to achieve mainstream adoption.
Analyst Arun Chandrasekaran mentioned that navigating the GenAI ecosystem will be challenging for enterprises due to the rapidly changing technology and vendor landscape. Although GenAI is currently in the "trough of disillusionment," the real benefits will emerge as the industry consolidation begins, and capabilities will advance rapidly after the hype fades.
The transformation to multimodal GenAI will enhance enterprise applications, introducing more new features. Currently, many multimodal models are limited to handling two to three modes, but this diversity is expected to increase in the coming years. Brethenoux noted that in real life, people understand information through the combination of audio, visual, and sensory inputs, making multimodal GenAI crucial.
Regarding open-source large language models, Chandrasekaran pointed out that they provide enterprises with innovation potential, allowing for customization, privacy and security controls, and model transparency, reducing reliance on specific vendors. Ultimately, open-source LLMs can provide smaller, easier-to-train models, supporting core business processes.
Domain-specific GenAI models are optimized for specific industries or tasks, improving alignment with enterprise use cases and enhancing accuracy and security. Chandrasekaran further stated that these models can achieve faster value realization, better performance, and stronger security, encouraging organizations to adopt GenAI in broader use cases.
Autonomous agent systems can achieve goals without human intervention, using AI technology to identify patterns, make decisions, and generate outputs. Brethenoux emphasized that autonomous agents represent a significant leap in AI capabilities, driving improvements in business operations and customer experiences, and potentially leading to a shift in organizational work patterns from execution to supervision.
Key Points:
🌟 By 2027, 40% of generative AI solutions will achieve multimodal integration, a significant increase from 2023.
🚀 Multimodal GenAI and open-source large language models are expected to bring significant competitive advantages within the next five years.
🔍 Domain-specific GenAI models can improve the accuracy and security of enterprise applications, encouraging broader adoption.