Research and consulting firm Gartner has projected that 40% of generative AI (GenAI) solutions will be multimodal by 2027, an increase from just 1% in 2023.
According to Gartner, multimodal GenAI, which can handle various inputs such as text, image, audio, and video, is expected to enhance human-AI interaction, opening up opportunities for more differentiated AI-enabled offerings.
“As GenAI models evolve to handle multiple modalities, they can capture relationships between different data streams, scaling benefits across all data types and applications,” said Erick Brethenoux, an analyst at Gartner.
He added that this development would allow AI to assist humans in performing more tasks, regardless of the environment.
Multimodal GenAI is one of the two key technologies identified in Gartner’s “2024 Hype Cycle for Generative AI”. The other major technology expected to shape the market is open-source large language models (LLMs). Both innovations may be able to create a competitive advantage for organizations adopting them early, with notable benefits anticipated over the next five years.
Despite the promise, navigating the GenAI landscape remains challenging for enterprises due to the rapid pace of technological change.
“Real benefits will emerge once the initial hype subsides,” said Arun Chandrasekaran, an analyst at Gartner.
He anticipates that advances in GenAI will accelerate as the technology matures, driving significant progress in the near future.
Multimodal GenAI
Multimodal GenAI is expected transform different industries, enabling enterprises to introduce new features and functionalities that were previously unattainable. At present, most multimodal models support only two or three modalities, but this number is likely to grow in the coming years.
Brethenoux highlighted the importance of multimodal models, noting, “In the real world, people process information through a combination of different sensory inputs. When AI systems can emulate this, they provide more accurate and timely results.”
Open-source LLMs, another key technology, have been recognized for their ability to democratize GenAI by providing more accessible models. These models can be customized for specific tasks, enhancing control over privacy, security, and innovation.
“Open-source LLMs reduce vendor lock-in and lower costs, allowing enterprises to develop business applications with greater flexibility,” Chandrasekaran said.
Domain-specific GenAI models, which are tailored to particular industries or business functions, also hold great promise. By focusing on specific tasks, these models can deliver improved accuracy, security, and reduced risk of hallucination — problems commonly associated with general-purpose models. These advancements are expected to accelerate AI adoption across industries.
Autonomous agents, AI systems capable of achieving goals without human intervention, are also gaining traction. These agents can learn from their environments, make decisions, and improve over time, potentially transforming business operations.
“Autonomous agents represent a major shift in AI capabilities, offering opportunities for cost savings and competitive advantages,” Brethenoux said.