HexaGen3D is an innovative approach to generating high-quality 3D assets from text prompts. It leverages a large pre-trained 2D diffusion model, fine-tuned from a pre-trained text-to-image model, to jointly predict six orthogonal projections and corresponding latent trimeshes. These latent values are then decoded to generate textured meshes. HexaGen3D does not require optimization for each sample and can infer high-quality, diverse objects within 7 seconds from text prompts, providing a better balance of quality and latency compared to existing methods. Additionally, HexaGen3D demonstrates strong generalization capabilities for novel objects or combinations.