[11:28 Wed,1.February 2023 by Thomas Richter] |
In September, OpenAI, the developers of the text AI ChatGPT and the image generation AI DALL-E 2, among others, presented the speech recognition system
![]() WhisperX Model For one thing, WhisperX recognises different speakers (unlike the original Whisper) and makes them recognisable in the transcribed speech text. In Whisper, the timestamps can be wrong by several seconds - to prevent this, among other things, pre-filtering is used by detecting speech activity, which significantly improves the quality of the matching and prevents catastrophic timestamp errors due to whispering (such as negative timestamp duration, etc.). In WhisperX, the timestamps that indicate when a speaker starts and stops talking in the transcription are now accurate down to the sound level. These improvements simplify the use of Whisper for the creation of subtitles, for example, or considerably, because thanks to WhispherX, much less manual editing is required. Not only is the timing now exactly right, i.e. when an actor begins to speak, the respective subtitle appears synchronously - word for word if desired - but the identification of who is saying something, which is important for subtitling for the hearing impaired, is done automatically. Currently, standard models are provided for English, French, German, Spanish, Italian, Japanese, Dutch and Polish, among others. WhisperX uses several free tools independently to produce robust word-level segmentation with speaker labels, namely, in addition to OpenAI&s Whisper, MetaAI&s wav2vec2.0 (responsible for phoneme-level sound detection) and ![]() ![]() WhisperX, like Whisper itself, is free of charge and freely available on Github including source code. WhisperX is written in Python and can be accessed via command line, provided you have the necessary knowledge. However, we think that WhisperX will soon be integrated into the first (online) subtitling tools or plugins in a more user-friendly way and thus offer users simple automatic subtitling. ![]() deutsche Version dieser Seite: WhisperX: Kostenlose lautgenaue Audiotranskription mit Sprechererkennung |
![]() |