Molmo is an open and cutting-edge family of multimodal AI models designed for rich interaction with both physical and virtual worlds by learning to point to the content it perceives, thus providing action and interaction capabilities for next-generation applications.