Nexa AI has recently launched its new OmniAudio-2.6B audio language model, designed to meet the efficient deployment needs of edge devices. Unlike traditional architectures that separate automatic speech recognition (ASR) and language models, OmniAudio-2.6B integrates Gemma-2-2b, Whisper Turbo, and a custom projector into a unified framework. This design eliminates the inefficiencies and delays caused by linking various components in traditional systems, making it particularly suitable for devices with limited computational resources.
Main Highlights:
Processing Speed: OmniAudio-2.6B performs exceptionally well. On the 2024 Mac Mini M4Pro, using the Nexa SDK and the FP16GGUF format, the model can achieve a processing speed of 35.23 tokens per second, while in the Q4_K_M GGUF format, it can process 66 tokens per second. In comparison, Qwen2-Audio-7B can only process 6.38 tokens per second on similar hardware, demonstrating a significant speed advantage.
Resource Efficiency: The model's compact design effectively reduces dependence on cloud resources, making it an ideal choice for power- and bandwidth-constrained wearable devices, automotive systems, and IoT devices. This feature allows it to operate efficiently under limited hardware conditions.
High Accuracy and Flexibility: Although OmniAudio-2.6B focuses on speed and efficiency, it also performs well in terms of accuracy, making it suitable for various tasks such as transcription, translation, and summarization. Whether for real-time speech processing or complex language tasks, OmniAudio-2.6B can provide precise results.
The launch of OmniAudio-2.6B marks another significant advancement for Nexa AI in the field of audio language models. Its optimized architecture not only enhances processing speed and efficiency but also opens up more possibilities for edge computing devices. With the continuous proliferation of IoT and wearable devices, OmniAudio-2.6B is expected to play a vital role in various application scenarios.
Model Address: https://huggingface.co/NexaAIDev/OmniAudio-2.6B
Product Address: https://nexa.ai/blogs/omniaudio-2.6b