Google has recently introduced ScreenAI, a visual language model that utilizes the PaLM 2-S method for automatic data generation, setting new SOTA records in various comprehension tasks. The model employs a multimodal encoder architecture to tackle tasks that involve converting text + images into text. By leveraging the automatic data generation method, researchers have enhanced the diversity and complexity of the datasets while maintaining efficiency. ScreenAI has demonstrated leading performance in tasks such as screen QA, infographics, and document understanding.