Only One Billion Parameters! Meissonic AI, an Image Generation Model, Can Generate High-Quality Images on Your Phone

AIbase基地

Published inAI News · 5 min read · Oct 18, 2024

303

Recently, a research team has jointly launched an open-source AI image generation model named Meissonic. Remarkably, this model, with only a billion parameters, can generate high-quality images. This compact design makes Meissonic potentially capable of localizing text-to-image applications on mobile devices.

Behind this technology, the development team includes Alibaba, Skywork AI, and researchers from multiple universities. They employed a unique transformer architecture and innovative training methods, enabling Meissonic to run on ordinary gaming PCs and potentially on mobile phones in the future.

Meissonic's training method utilizes a technique called "masked image modeling," which, in simple terms, involves hiding part of the image during the training process. The model learns to reconstruct the missing parts based on the visible areas and text descriptions. This approach helps the model understand the relationship between image elements and text.

Meissonic's architecture allows it to generate high-resolution images of 1024x1024 pixels, whether they are realistic scenes, stylized text, emojis, or cartoon stickers.

Unlike traditional autoregressive models that generate images step-by-step, Meissonic predicts all image information simultaneously through parallel iterative optimization. This innovation significantly reduces the decoding steps, approximately by 99%, greatly enhancing the speed of image generation.

In the construction process of the model, researchers went through four steps:

First, they taught the model basic concepts using 200 million 256x256 pixel images; then, they enhanced its text understanding capabilities with 10 million rigorously selected image-text pairs; next, by adding special compression layers, the model could output 1024x1024 pixel images; finally, they fine-tuned the model, incorporating data based on human preferences to improve its performance.

Interestingly, despite its smaller number of parameters, Meissonic outperforms some larger models like SDXL and DeepFloyd-XL in multiple benchmarks, achieving a high score of 28.83 in "human preference scores." Additionally, Meissonic can perform image inpainting and expansion without additional training, allowing users to easily add missing parts of the image or creatively enhance existing images.

The research team believes that this approach could facilitate the rapid and low-cost development of custom AI image generators and potentially drive the development of text-to-image applications on mobile devices. Interested individuals can find the demo version on Hugging Face and view the model's code on GitHub, which can be easily run on consumer GPUs with just 8GB of VRAM.

Demo: https://huggingface.co/spaces/MeissonFlow/meissonic

Project: https://github.com/viiika/Meissonic

Key Points:

🌟 Meissonic is an open-source AI model that generates high-quality images with only a billion parameters, suitable for use on ordinary gaming PCs and future mobile devices.

⚡ Adopting parallel iterative optimization training methods, Meissonic is 99% faster in image generation speed compared to traditional models.

🏆 Despite its smaller parameter count, Meissonic outperforms larger models in multiple tests and can perform image inpainting and expansion without additional training.

Zhuoyu Technology Successfully Integrates Alibaba's Tongyi Large Model, Driving the Advancement of Intelligent Driving Technology

At the recent Shanghai Auto Show, Zhuoyu Technology (hereinafter referred to as "Zhuoyu") announced its successful integration of Alibaba's Tongyi large model, marking another significant advancement in the field of intelligent driving. As an intelligent driving solutions provider and service provider, Zhuoyu leverages Alibaba Cloud's powerful computing capabilities to build an end-to-end intelligent driving world model, aiming to enhance the intelligence level of vehicles. In recent years, intelligent driving technology has rapidly developed, with major automakers significantly increasing their investment in this area. Zhuoyu Technology's collaboration with Alibaba's Tongyi large model has...

Revolutionizing Video Creation! Alibaba's VACE Model Unifies Text, Image, and Video Inputs

Scientists at Alibaba Group have introduced VACE, a universal AI model designed to unify a wide range of video generation and editing tasks. At the heart of VACE is an enhanced Diffusion Transformer architecture, innovating with a novel input format called "Video Conditional Unit" (VCU). VCU distills diverse modalities such as text prompts, reference images or video sequences, and spatial masks into a unified representation, and through a specialized mechanism coordinates different inputs to avoid conflicts. Concept decoupling enables fine-grained control.

Alibaba's Tongyi Wanxiang 2.1 Open-Source First and Last Frame Video Generation Model Wan2.1-FLF2V-14B

Alibaba's Tongyi announced the open-sourcing of the Wan2.1 series of models, including a powerful first and last frame video generation model. This model utilizes the advanced DiT architecture, achieving several technological breakthroughs. It significantly reduces the computational cost of generating high-definition videos while ensuring high temporal and spatial consistency. This open-sourcing provides developers and creators with powerful tools, driving the advancement of video generation technology.

Alibaba's AI Model Receives FDA Breakthrough Device Designation

Alibaba's AI model, DAMO PANDA, has recently received Breakthrough Device Designation from the U.S. Food and Drug Administration (FDA). This achievement marks a significant breakthrough in early pancreatic cancer screening, offering new possibilities for early diagnosis. DAMO PANDA, developed by Alibaba's DAMO Academy, is an AI model focused on pancreatic cancer screening. Its primary function is to analyze plain CT images and accurately identify subtle lesions that are difficult for the human eye to detect.

Alibaba's AI Model DAMO PANDA Receives FDA Breakthrough Device Designation for Pancreatic Cancer Early Detection

Alibaba's AI model, DAMO PANDA, has been officially designated a breakthrough medical device by the U.S. Food and Drug Administration (FDA). This certification marks a significant breakthrough for Alibaba in the field of AI-powered healthcare and represents the first time a leading Chinese tech company has received this prestigious recognition. DAMO PANDA, developed by Alibaba's DAMO Academy, is an AI model designed for pancreatic cancer screening. It aims to accurately identify subtle lesions in CT images, aiding in early detection of the disease.

Yiwu Mall Group Integrates Alibaba's Tongyi Large Model to Create AI-Powered Business Assistant

Yiwu Mall Group announced its official integration with Alibaba's Tongyi large language model. Leveraging Alibaba's strengths in cloud computing, big data, and e-commerce, this collaboration will empower 2.1 million small and medium-sized merchants to leverage AI for precise business operations and rapid expansion into overseas markets. This partnership marks a significant step forward in Yiwu Mall Group's digital transformation and globalization strategy, and also highlights Alibaba's crucial role in driving the digital transformation of SMEs.

Alibaba Cloud AIStack Large Model Appliance Makes Debut, Offering Cost-Effective AI Solutions for Enterprises

At the 8th Digital China Summit, Alibaba Cloud unveiled its new AIStack large model appliance, marking another significant advancement in its enterprise-grade AI solutions. This appliance integrates hardware and software for deep optimization, aiming to provide lightweight and cost-effective intelligent services to various industries including government, energy, and healthcare. The launch of AIStack is Alibaba Cloud's positive response to market demand for efficient and economical AI services. Designed specifically for enterprises, AIStack...

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Only One Billion Parameters! Meissonic AI, an Image Generation Model, Can Generate High-Quality Images on Your Phone

AIbase基地

This article is from AIbase Daily

AI News Recommendations