LGM is a novel framework for generating high-resolution 3D models from textual prompts or single-view images. Its key insights include: (1) 3D Representation: We propose a multi-view Gaussian feature as an efficient yet powerful representation that can be fused for differentiable rendering. (2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operation for multi-view images, which can be utilized to generate from text or single-view image inputs using multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our method. Notably, we achieve high-resolution 3D content generation while maintaining fast rendering speed for 3D objects, even when training resolution is increased to 512x512.