Recently, Meta AI open-sourced a foundational multimodal language model named SPIRIT LM, which can freely mix text and speech, opening new possibilities for multimodal tasks involving audio and text. SPIRIT LM is based on a pre-trained text language model with 7 billion parameters, which has been continuously trained on text and speech units, expanding into the speech modality. It can understand and generate text like a large text model, while also being capable of understanding and generating speech, and even mixing text and speech to create various forms of expression.
Recently, according to Bloomberg, Apple is developing a new voice assistant, Siri, which will utilize advanced large language models (LLMs) technology to achieve a more natural conversational experience. This move aims to close the gap with competitors, as products like Google's Gemini Live have demonstrated higher naturalness in conversational capabilities in recent years. According to sources, the new Siri assistant will completely replace the current Siri interface used by customers, with a planned release in 2026.
With the rapid development of artificial intelligence (AI) technologies in industrial sectors, experts indicate that high-quality data and data governance will be more important than generative technologies. By 2025, companies must focus more on scalable and flexible solutions when adopting AI, rather than solely relying on Generative AI (GenAI). Image note: The picture is generated by AI, authorized by the image service provider Midjourney, based on the analysis by Qlik, the key to fully harnessing AI's potential lies in companies' investments in high-quality, real-time data.