ViTLP

A visually guided generative text layout pre-trained model for document intelligence.

CommonProductProductivityOCRDocument Intelligence
ViTLP is a visually guided generative text layout pre-trained model designed to enhance the efficiency and accuracy of document intelligent processing. This model combines OCR text localization and recognition capabilities, enabling rapid and accurate text detection and recognition on document images. The pre-trained version, ViTLP-medium (380M parameters), provides a balanced solution under constraints of computational resources and the scale of pre-training datasets, ensuring performance while optimizing inference speed and memory usage. ViTLP's inference speed typically ranges from 5 to 10 seconds per page on an Nvidia 4090, making it competitive compared to most OCR engines.
Visit

ViTLP Visit Over Time

Monthly Visits

515580771

Bounce Rate

37.20%

Page per Visit

5.8

Visit Duration

00:06:42

ViTLP Visit Trend

ViTLP Visit Geography

ViTLP Traffic Sources

ViTLP Alternatives