Groq Official Docs

Whisper Large v3 Turbo

whisper-large-v3-turbo

active

Whisper Large v3 Turbo

An automatic speech recognition (ASR) model that can transcribe audio at an impressive 216x speed, making it highly efficient for fast and accurate speech-to-text conversion.

Supports a 448 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Capable of generating structured output formats.

For ASR models, Groq charges a minimum of 10 seconds per request, and the maximum file size is 25 MB.

Additional Information

Notes

For ASR models, Groq charges a minimum of 10 seconds per request. Maximum file size is 25 MB.

Capabilities

Text

Input Pricing

$0.04/ second

Context: 448 tokens

Video

Input Pricing

$0.000011111/second

Audio

Input Pricing

$ 0.000667 /minute

Generation Pricing

Not available

Transcription

Transcription Pricing

$0.006/minute

Additional Model Information

Tool Use

Structured Output

Yes

Groq Official Docs

Whisper Large v3 Turbo

Whisper Large v3 Turbo

Additional Information

Notes

Capabilities

Text

Input Pricing

Video

Input Pricing

Audio

Input Pricing

Generation Pricing

Transcription

Transcription Pricing

Additional Model Information

Tool Use

Structured Output

Reasoning

Anthropic

Cohere

DeepSeek

Google Vertex AI

Groq

Mistral

OpenAI

X.AI

Additional Information

Notes

Capabilities

Text

Input Pricing

Video

Input Pricing

Audio

Input Pricing

Generation Pricing

Transcription

Transcription Pricing

Additional Model Information

Tool Use

Structured Output

Reasoning

Flatten your repo for AI in seconds

Anthropic

Cohere

DeepSeek

Google Vertex AI

Groq

Mistral

OpenAI

X.AI