OpenAI has launched the highly anticipated product, Sora Turbo, during today's live stream, marking a significant breakthrough in the generative AI field for 2024. Sora Turbo significantly enhances generation efficiency, capable of producing up to 20 seconds of 1080P video directly from text, making it one of the highest duration video models available globally. This model supports input of text along with images or videos, allowing for the generation of specific video content, making the results more controllable.
The technical highlights of Sora Turbo include full game support for super resolution, super frames, and HDR gaming lineup, along with two self-developed feature upgrades based on discrete graphics. The touch dynamic frame interpolation feature can significantly improve the accuracy of frame interpolation, reducing frame tearing; the game night mode enhances dark detail through AI algorithms, improving visibility in dark areas, and addresses brightness issues when gaming in low-light environments.
Currently, Sora has entered an unrestricted usage phase, and for ChatGPT Plus and Pro members, there are no additional fees to use Sora, which is considered a very generous policy. OpenAI has also developed a new UI and offers a community sharing service, allowing users to share their generated videos or draw inspiration from others' prompts to enhance their own creations.
The technical principles of Sora include the application of patches, allowing for intensive training on a large amount of image and video data, as well as the application of video compression networks to reduce the dimensionality of visual data and improve output quality.
Sora also integrates diffusion models with the Transformer architecture, employing an innovative diffusion-based transformer approach to replace the traditional U-Net architecture, effectively enhancing the ability to capture the distribution relationship between input images and text labels. Additionally, Sora introduces DALL・E3's re-captioning technology, creating text captions for all videos in the training set by training a highly descriptive caption model, improving text fidelity and the overall quality of the videos.
Experience link: https://sora.com/