ViTLP
A visually guided generative text layout pre-trained model for document intelligence.
CommonProductProductivityOCRDocument Intelligence
ViTLP is a visually guided generative text layout pre-trained model designed to enhance the efficiency and accuracy of document intelligent processing. This model combines OCR text localization and recognition capabilities, enabling rapid and accurate text detection and recognition on document images. The pre-trained version, ViTLP-medium (380M parameters), provides a balanced solution under constraints of computational resources and the scale of pre-training datasets, ensuring performance while optimizing inference speed and memory usage. ViTLP's inference speed typically ranges from 5 to 10 seconds per page on an Nvidia 4090, making it competitive compared to most OCR engines.
ViTLP Visit Over Time
Monthly Visits
515580771
Bounce Rate
37.20%
Page per Visit
5.8
Visit Duration
00:06:42