CrisperWhisper is an advanced variant of OpenAI's Whisper model, specifically designed for fast, accurate, verbatim speech recognition, providing precise word-level timestamps. Unlike the original Whisper model, CrisperWhisper aims to transcribe every spoken word, including filler words, pauses, stutters, and false starts. This model ranks first in word-level datasets such as TED and AMI, and has been accepted at INTERSPEECH 2024.