fixie-ai/ultravox-v0_4_1-llama-3_1-70b is a large language model based on pre-trained Llama3.1-70B-Instruct and whisper-large-v3-turbo, capable of handling speech and text input to generate text output. The model converts input audio into embeddings using a special pseudo-tag <|audio|>, which are then merged with text prompts to generate output text. Ultravox is developed to expand the application scenarios of speech recognition and text generation, such as voice agents, speech-to-speech translation, and spoken audio analysis. The model is under the MIT license and developed by Fixie.ai.