Visual Sketchpad
A visual reasoning tool for multimodal large language models (LLMs)
CommonProductProductivityMultimodalVisual Reasoning
Visual Sketchpad is a framework that provides a visual sketchpad and drawing tools for multimodal large language models (LLMs). It allows models to operate on visually created elements while planning and reasoning, unlike previous methods that relied solely on text for reasoning steps. Visual Sketchpad enables models to draw using lines, boxes, annotations, and other more human-like drawing elements, thereby facilitating better reasoning. Additionally, it can incorporate expert vision models, such as object detection models for drawing bounding boxes or segmentation models for drawing masks, to further enhance visual perception and reasoning capabilities.
Visual Sketchpad Visit Over Time
Monthly Visits
2759
Bounce Rate
61.01%
Page per Visit
1.3
Visit Duration
00:01:52