Visual Sketchpad is a framework that provides a visual sketchpad and drawing tools for multimodal large language models (LLMs). It allows models to operate on visually created elements while planning and reasoning, unlike previous methods that relied solely on text for reasoning steps. Visual Sketchpad enables models to draw using lines, boxes, annotations, and other more human-like drawing elements, thereby facilitating better reasoning. Additionally, it can incorporate expert vision models, such as object detection models for drawing bounding boxes or segmentation models for drawing masks, to further enhance visual perception and reasoning capabilities.