At the re:Invent conference on Tuesday, Amazon Web Services (AWS) announced the launch of its new series of multimodal generative AI models called Nova. The released Nova series includes four text generation models: Micro, Lite, Pro, and Premier, along with the image generation model Nova Canvas and the video generation model Nova Reel.
Amazon CEO Andy Jassy stated that the Micro, Lite, and Pro models would be available to AWS customers starting that day, while the Premier model is expected to be released in early 2025. The Nova series is designed to handle various input forms, including text, images, and videos, with the text generation models particularly optimized for 15 languages, primarily supporting English.
Nova Text Generation Models
The Nova text generation models have different features and specifications. The Micro model is known for its low latency and quick response, but it only supports text input and output, making it suitable for fast processing tasks. The Lite model supports quick input processing for text, images, and videos, while the Pro model offers a balance between accuracy, speed, and cost. The Premier model is the most powerful, designed for complex workloads and advanced applications that require custom models.
The context window sizes of these models also vary. The Micro model supports up to approximately 100,000 words, while the Lite and Pro models can handle around 225,000 words, 15,000 lines of code, or 30 minutes of audio content. AWS stated that by early 2025, the context window for certain Nova models will expand to 2 million tokens.
Jassy emphasized that the Nova series is the fastest and most cost-effective AI model in its class. They can be fine-tuned on AWS's AI development platform, AWS Bedrock, further enhancing speed and efficiency. Additionally, the Nova series can seamlessly integrate with proprietary systems and APIs to perform various automated tasks.
Nova Canvas and Nova Reel
In addition to text generation, AWS also launched two image and video generation tools: Nova Canvas and Nova Reel. Nova Canvas allows users to generate and edit images through prompts, providing control over the color schemes and layouts of the generated images. Nova Reel can generate videos up to 6 seconds long based on prompts or reference images and allows users to adjust camera movements, including panning, rotating, and zooming.
Here are images from Canvas:
Although Reel is currently limited to creating 6-second short videos, AWS stated that longer video versions will be released soon. Furthermore, AWS has built-in responsible usage controls for these tools, including watermarks and content review, to prevent the generation of harmful content.
Jassy also revealed that AWS is developing a voice-to-voice model expected to launch in the first quarter of 2025, which will support voice input and generate natural human speech. Additionally, AWS is developing an "arbitrary to arbitrary" model expected to be released in mid-2025, supporting multimodal conversions across text, voice, images, and videos.
AWS remains cautious about the confidentiality of its training data and has stated that it will offer a compensation policy regarding copyright issues to protect customers' legal rights.