Last updated: 16/04/2025

GroqOfficial Docs

Whisper Large v3

whisper-large-v3

active

Whisper Large v3

Whisper Large v3 is a powerful automatic speech recognition (ASR) model developed by OpenAI, capable of transcribing audio to text with impressive accuracy and speed.

Supports a 448 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Capable of generating structured output formats.

This model is well-suited for a wide range of speech-to-text applications, from meeting transcription to podcast captioning. With its large context window and multi-modal capabilities, Whisper Large v3 can provide reliable and efficient transcription services.

Additional Information

Notes

Groq charges a minimum of 10 seconds per request. The model has a maximum file size of 25 MB.

Capabilities

Text

Input Pricing

$0.11/ second

Context: 448 tokens

Output Pricing

$0.11/ second

Image

Input Pricing

Per image pricing not available

Video

Input Pricing

$0.00025/second

Audio

Input Pricing

$ 0.006 /minute

Generation Pricing

Not available

Transcription

Transcription Pricing

$0.00185/minute

Embeddings

Embeddings Pricing

$0.11/1k tokens

Additional Model Information

Tool Use

No

Structured Output

Yes

Reasoning

No

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop