8x faster, same WER, and it fits on your laptop
If you've been living under a rock, OpenAI quietly dropped Whisper Large V3 Turbo and it absolutely cooks. Spoiler: it's 8x faster than V2 with basically the same word error rate.
The Setup
I rebuilt one of my AssemblyAI-adjacent FFmpeg pipelines on this thing in an afternoon. It's a distilled decoder — 4 layers instead of 32 — and on my M4 Mac it transcribes a 10-minute podcast in under 30 seconds.
{`pip install -U openai-whisper
# transcribe locally, no API calls
whisper episode.mp3 --model large-v3-turbo --output_format srt
# pipe in from FFmpeg if you're me
ffmpeg -i raw.mov -ac 1 -ar 16000 -f wav - | whisper - --model large-v3-turbo`}The Money Pattern
The real flex is dropping it into a queue worker. I run it behind a Netlify function trigger that hands the file off to a Mac mini running the model. AssemblyAI bill: gone.
{`import whisper
model = whisper.load_model("large-v3-turbo")
def transcribe(path: str) -> dict:
result = model.transcribe(
path,
language="en",
word_timestamps=True,
condition_on_previous_text=False, # less hallucination
)
return {
"text": result["text"],
"segments": result["segments"],
}`}The Catch
It still hallucinates on long silences — "Thanks for watching!" appearing in podcasts that have never been on YouTube. Set condition_on_previous_text=False or VAD-trim the silence before you hand it the audio.
Also: it's English-tuned. Non-English WER is noticeably worse than V3 proper. If you're shipping multilingual, stick with the full V3 or fall back to AssemblyAI for the edge cases.
The Verdict
For English transcription on owned hardware, Whisper V3 Turbo is the new default. The speed unlocks workflows that weren't worth it before — auto-captioning every internal Loom, transcribing Pipedrive call recordings, every Mariner edit getting subtitles for free. Pull the model tonight, ship it tomorrow.