ByteDance has collaborated with the University of Science and Technology of China for the first time to launch the high-resolution multi-modal document large model, DocPedia. The model has been uploaded to arXiv, solving the problem of previous models being unable to parse high-resolution document images. With a resolution of 2560×2560, DocPedia exhibits significant superiority in areas such as image-text understanding and visual question answering. The model enhances performance by addressing resolution issues through the frequency domain, creating a new technical breakthrough.