HKU & ByteDance Unveil LlamaGen: Open-Source Autoregressive Text-to-Image Model, Revolutionizing Image Generation

AIbase

Published inAI News · 4 min read · Jul 4, 2024

175

Developed jointly by researchers from the University of Hong Kong and Bytedance, LlamaGen is an image generation method based on the autoregressive model Llama, which has shown the potential to surpass traditional diffusion models in the field of image generation.

The open-source release of LlamaGen quickly garnered nearly 900 stars on GitHub, acknowledging this achievement. This result not only proves the competitiveness of autoregressive models in image generation but also brings new vitality and innovation to the open-source community.

On the ImageNet test benchmark, LlamaGen's performance exceeds that of diffusion models such as LDM and DiT, thanks to the research team's deep understanding and optimization of the autoregressive model architecture. They achieved superior results over previous Tokenizers on ImageNet and COCO, including VQGAN, ViT-VQGAN, and MaskGI, by retraining the Image Tokenizer.

The technical implementation of LlamaGen is based on several key design principles: image compression/quantizer, scalable image generation models, and high-quality training data. The research team adopted a CNN architecture similar to VQ-GAN to convert continuous images into discrete Tokens and significantly improve visual quality and resolution during the two-stage training process.

Project address: https://top.aibase.com/tool/llamagen

Online experience address: https://huggingface.co/spaces/FoundationVision/LlamaGen

In the first phase, the model was trained on the LAION-COCO 50M subset with an image resolution of 256×256. The research team selected a high-quality image dataset by screening valid image URLs, aesthetic scores, watermark scores, etc. In the second phase, fine-tuning was conducted on a 10 million-scale internal high aesthetic quality image dataset, with image resolution increased to 512×512, further enhancing the visual quality of generated images.

The advantage of LlamaGen lies in its excellent Image Tokenizer and the extensibility of the Llama architecture. During the actual generation process, LlamaGen has demonstrated strong competitiveness in metrics such as FID, IS, Precision, and Recall. Compared to previous autoregressive models, LlamaGen performs exceptionally well across various parameter scales.

Although LlamaGen has achieved significant results, researchers also point out that the current LlamaGen has only reached the Stable Diffusion v1 stage. Future improvements include higher resolutions, more Aspect Ratios, higher controllability, and video generation.

LlamaGen now supports online experiences, and interested friends can directly visit the LlamaGen space on Hugging Face to try this revolutionary image generation technology. Additionally, the open-source release of LlamaGen provides a platform for developers and researchers around the world to participate and contribute.

Lovable7's Annual Income Reaches 80 Million Dollars, Half of the Team Are AI-Native Employees

AI-native employees are reshaping work models. Start-up Lovable achieved $80M annual revenue in 7 months with a 35-person team through AI-driven agility: instant AI implementation bypassing traditional processes, using proprietary AI tools for rapid development, and empowering autonomous young employees. The author predicts more AI-native workers will emerge, challenging traditional management while becoming core innovation drivers.....

40 Million Students and Parents Use AI to Apply for College Quark Breaks Records in Gaokao Services

The 2025 Gaokao volunteer application service has concluded, with data from the Quark platform showing that its AI service has set multiple records: serving over 40 million users, generating 12 million volunteer reports, and answering 330 million questions. The three core features launched this year are based on a self-developed Gaokao large model, achieving intelligent assistance throughout the entire process from consultation to decision-making. Notably, students' questions have shown a trend towards deeper personalization, with complex questions doubling in proportion. The platform has also extended its AI services to rural areas through the Warm Light Project, serving a cumulative total of 160 million users over five years, demonstrating

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

1.Tencent's Hunyuan3D-PolyGen boosts 3D modeling efficiency by 70% with BPT tech. 2.Alibaba's HumanOmniV2 achieves 69.33% accuracy in multilingual input. 3.DingTalk AI processes 1k tasks/hour with 'spreadsheet-as-document'. 4.Baidu PaddleOCR3.1 improves 37-language recognition by 30%. 5.Microsoft Deep Research opens API. 6.HKPolyU & OPPO's DLoRAL speeds video enhancement 10x. 7.Google opens MCP Toolbox for SQL. 8.Microsoft Win11 to add AI dynamic....

DingTalk Launches New AI Spreadsheet Functionality, Introducing the 'Spreadsheet as Document' Feature

Recently, DingTalk officially launched the 'AI Spreadsheet' feature, marking the official start of a new application entry point for the AI era. In DingTalk AI Spreadsheet, AI technology has become an intrinsic capability, with each cell serving as an AI access point, creating intelligent workflows and providing enterprises and users with an unprecedented method of building business systems.

Xbox Executive's Suggestion to Use AI to Deal with Layoff Emotions Sparks Controversy

Microsoft announced the global layoff of 9,000 employees. Xbox executive Matt Turnbull suggested that laid-off employees use AI tools like ChatGPT to cope with their emotions, which sparked controversy. He shared AI prompt templates to help with career planning, but the suggestion was criticized as distasteful. Netizens believe that AI cannot replace the emotional trauma caused by layoffs. This round of layoffs affects 4% of Microsoft's employees, and the gaming department may be hit the hardest. The incident reflects broader societal discussions on employee mental health support and the boundaries of AI application in the context of the current trend of layoffs in tech companies.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

HKU & ByteDance Unveil LlamaGen: Open-Source Autoregressive Text-to-Image Model, Revolutionizing Image Generation

AIbase

This article is from AIbase Daily

AI News Recommendations

Lovable7's Annual Income Reaches 80 Million Dollars, Half of the Team Are AI-Native Employees

40 Million Students and Parents Use AI to Apply for College Quark Breaks Records in Gaokao Services

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

Ali HumanOmniV2 Launches with a Shock: The New King of Multimodal AI, Accuracy Surges to 69.33%

Samsung Expects Second-Quarter Profit to Halve Amid Challenges in AI Demand

DingTalk AI Table Launches: Process 1,000 Tasks in 1 Hour, Easy Data Analysis for Everyone

Apple and Columbia University Collaborate to Develop AI System SceneScout to Assist Blind People with Street View Navigation

DingTalk Launches New AI Spreadsheet Functionality, Introducing the 'Spreadsheet as Document' Feature

Microsoft Win11 is about to launch the AI Dynamic Wallpaper feature, preview code has appeared

Xbox Executive's Suggestion to Use AI to Deal with Layoff Emotions Sparks Controversy