Whisper is a general-purpose speech recognition model. It is trained on a large and diverse set of audio data and is a multi-task model capable of performing multilingual speech recognition, speech translation, and language identification.