AI Daily: First AI Programmer Devin Exposed for Falsification; Sora Alternative? StreamingT2V Demo Link Released; Udio AI Expands to Comedy and Speeches; XAI Launches Grok-1.5Vision Multimodal Model

Welcome to the AI Daily section! This is your daily guide to exploring the world of artificial intelligence. Every day, we bring you the hottest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications.

Check out the latest AI products here: https://top.aibase.com/

🤖📱💼AI Applications

Is Sora being replaced? The StreamingT2V AI video model, capable of generating 2-minute-long videos, is now open source and available for trial

AiBase Summary:

⭐ StreamingT2V can generate videos up to 1200 frames, or 2 minutes long, surpassing the Sora model.

⭐ It employs advanced autoregressive techniques to maintain temporal consistency and high quality in videos.

⭐ It is a free open-source project that seamlessly integrates with models like SVD and animatediff.

⭐ The code has been released, and a trial address is available. Video generation takes a long time; one video is expected to take over 13 minutes to generate.

Open-source code: https://top.aibase.com/tool/streamingt2v

Paper address: https://arxiv.org/pdf/2403.14773.pdf

Trial address 1: https://huggingface.co/spaces/PAIR/StreamingT2V

Trial address 2: https://replicate.com/camenduru/streaming-t2v

Udio AI offers multifunctional audio generation, including comedy, speeches, radio broadcasts, and more

AiBase Summary:

⭐ Udio can create music and also generate comedy, speeches, NPC dialogues, sports analysis, advertisements, radio broadcasts, ASMR, natural sound effects, and more.

⭐ Simple text description creation: Users can guide Udio to generate music with specific themes and emotions through simple text descriptions.

⭐ Wide range of music types and styles supported: Udio supports various music types and styles to cater to different users' music tastes.

Interested parties can view the playlist here: https://www.udio.com/playlists/deGuVDLYd9MrXtxnxfX7z1

Experience address: https://top.aibase.com/tool/udio

Meitu Wink's "AI Anime" feature upgrade can transform short dramas into anime style

微信截图_20240415085954.png

AiBase Summary:

⭐ Meitu Wink recently upgraded its "AI Anime" feature, converting short dramas into anime style.

⭐ The introduction of the CFA module optimizes action consistency, generating more fluid and natural anime videos.

⭐ Fragmented technology processes long videos, reducing waiting time, making creation more free and smooth.

StableDesign: An SD solution suitable for interior design, allowing text prompts to modify interior design drawings

AiBase Summary:

⭐️ Developers have created a project for generative interior design.

⭐️ By downloading Airbnb property data and image metadata, features are extracted for training.

⭐️ ControlNet and Laura models are used for training to achieve control over interior design generation and text-to-image conversion.

Online experience: https://huggingface.co/spaces/MykolaL/StableDesign

SwapAnything: A more powerful alternative to face swapping, allowing the replacement of any element in an image

AiBase Summary:

🔍 The SwapAnything framework has advantages in precise control of objects and parts, retaining context pixels, and adapting to personalized concepts.

🔍 Through directed variable swapping and appearance adjustment techniques, SwapAnything demonstrates precise and faithful swapping capabilities.

🔍 SwapAnything can precisely control any object in an image, achieving high-quality personalized swaps.

Project entry: https://top.aibase.com/tool/swapanything

MagicTime, an AI time-lapse video generation tool, releases an online experience address

AiBase Summary:

⭐ Time-lapse video is a photography technique that showcases long-term changes.

⭐ MagicTime can generate time-lapse videos based on textual descriptions.

⭐ Applications are widespread, capable of recording natural phenomena and human changes.

Project address: https://top.aibase.com/tool/magictime

Experience address: https://huggingface.co/spaces/BestWishYsh/MagicTime

Model download address: https://huggingface.co/Kijai/MagicTime-merged-fp16

STORM, an automated writing tool, can generate deep, long-form content like Wikipedia

AiBase Summary:

⭐️ STORM automatically gathers materials, simulates expert dialogues, and generates structured article outlines.

⭐️ STORM conducts efficient research, integrates multi-angle information, and promotes in-depth understanding and precise problem generation.

⭐️ After generating an article outline, STORM fully writes and polishes the article to improve overall quality.

Project address: https://top.aibase.com/tool/storm

Meta introduces the ViewDiff model: text-to-3D image generation from multiple perspectives

AiBase Summary:

🌟 ViewDiff addresses three major challenges in generating consistent, multi-perspective 3D images from text.

🌟 The autoregressive generation module allows ViewDiff to generate more 3D-consistent images at any given perspective.

🌟 ViewDiff fills the technical gap in the field of text-to-multi-perspective 3D image generation.

Paper address: https://arxiv.org/abs/2403.01807

Project address: https://top.aibase.com/tool/viewdiff

📰🤖📢AI News

The first AI programmer caught for falsification, Devin shocks Silicon Valley again! Detailed text explanation of the exposé video attached

AiBase Summary:

⭐️ A YouTuber programmer exposes the first AI programmer, Devin, for video falsification.

⭐️ Devin's demonstration is not as miraculous as it seems, fixing bugs while creating new ones.

⭐️ Questioned and debunked, netizens scoff at the hype around AI products.

Detailed content: https://www.chinaz.com/2024/0415/1610127.shtml

Elon Musk's XAI releases Grok-1.5Vision multi-modal model, capable of processing text and image information

AiBase Summary:

⭐️ The Grok-1.5Vision model demonstrates superior performance, surpassing GPT4V.

⭐️ It performs excellently in the RealWorldQA benchmark test, understanding real-world physical spaces.

⭐️ The Grok-1.5Vision model has strong real-world spatial processing and understanding capabilities.

Official website address: https://top.aibase.com/tool/grok-1-5-vision-preview

360 Zhinao's 7B parameter large model is officially open-source, supporting input of up to approximately 500,000 characters

AiBase Summary:

🧠 360 Zhinao's 7B parameter large model is officially open-source.

🧩 Supports different text length versions, with the longest capable of processing 360K long texts.

🔥 Performs exceptionally in capability tests, ranking among the top three in comprehensive capabilities.

Project address: https://github.com/Qihoo360/360zhinao

Adobe's image generation AI "Firefly" has about 5% of its training set as AI-generated images

AiBase Summary:

⭐ Adobe Stock has begun accepting AI content, with approximately 14% being AI-generated images.

⭐ Scholars point out that Firefly learns from images generated by Midjourney, contradicting its claims.

⭐ Users express dissatisfaction with Adobe using their works to train Firefly.

Code and model fully open-source! Professor Jiajiaya's team's multi-modal model Mini-Gemini tops the trending list

AiBase Summary:

⭐️ The Mini-Gemini model achieves significant results in multi-modal tasks, open-sourcing both code and model data.

⭐️ Mini-Gemini combines image understanding and generation, demonstrating excellent image reasoning capabilities.

⭐️ Utilizing the Gemini visual dual-branch information mining method, it effectively handles high-resolution images and generates rich visual and textual content.

Project address: https://top.aibase.com/tool/mini-gemini

Trial address: https://103.170.5.190:7860/

Face the wall intelligence open-sources the MiniCPM2.0 series models, with significant enhancements in OCR capabilities

AiBase Summary:

⭐ MiniCPM-V2.0 is the most powerful multi-modal model on the edge, with strong OCR capabilities.

⭐ MiniCPM-1.2B is a base model suitable for edge scenarios, with fast inference speed and low cost.

⭐ MiniCPM-2B-128K is the smallest long-text model, capable of processing 128K text content.

MiniCPM-V2.0:

https://github.com/OpenBMB/MiniCPM-V

MiniCPM series open-source address:

https://github.com/OpenBMB/MiniCPM

MiniCPM technical Blog address:

https://openbmb.vercel.app/?category=Chinese+Blog

Competition heats up! ChatGPT's growth-momentum wanes, with 1.77 billion global visits in March, while Claude is on the rise

AiBase Summary:

📉 ChatGPT's global visit growth by momentum slows down, despite the introduction of new features.

🚀 Anthropic's Claude is thriving in the European market, intensifying competition with ChatGPT.

💥 Following the release of Claude3, it continues to grow rapidly, demonstrating the potential of new products.

InstantID team introduces a new style transfer method, InstantStyle, allowing you to instantly immerse yourself in "Van Gogh's Starry Night"

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

AI Daily: First AI Programmer Devin Exposed for Falsification; Sora Alternative? StreamingT2V Demo Link Released; Udio AI Expands to Comedy and Speeches; XAI Launches Grok-1.5Vision Multimodal Model

站长之家

This article is from AIbase Daily