Last updated: 16/04/2025

GroqOfficial Docs

Whisper Large v3 Turbo

whisper-large-v3-turbo

active

Whisper Large v3 Turbo

An automatic speech recognition (ASR) model that can transcribe audio at an impressive 216x speed, making it highly efficient for fast and accurate speech-to-text conversion.

Supports a 448 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Capable of generating structured output formats.

For ASR models, Groq charges a minimum of 10 seconds per request, and the maximum file size is 25 MB.

Additional Information

Notes

For ASR models, Groq charges a minimum of 10 seconds per request. Maximum file size is 25 MB.

Capabilities

Text

Input Pricing

$0.04/ second

Context: 448 tokens

Video

Input Pricing

$0.000011111/second

Audio

Input Pricing

$ 0.000667 /minute

Generation Pricing

Not available

Transcription

Transcription Pricing

$0.006/minute

Additional Model Information

Tool Use

No

Structured Output

Yes

Reasoning

No

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop