UniTok is an innovative visual tokenization technology designed to bridge the gap between visual generation and understanding. Through multi-codebook quantization technology, it significantly improves the representation capability of discrete tokenizers, enabling them to capture richer visual details and semantic information. This technology breaks through the bottleneck of traditional tokenizers in the training process, providing an efficient and unified solution for visual generation and understanding tasks. UniTok excels in image generation and understanding tasks, such as achieving a significant zero-shot accuracy improvement on ImageNet. The main advantages of this technology include efficiency, flexibility, and strong support for multimodal tasks, bringing new possibilities to the field of visual generation and understanding.