Last updated: 16/04/2025

OpenAIOfficial Docs

Whisper

whisper-1

active

Whisper

Whisper is a powerful, general-purpose speech recognition model that can transcribe and translate audio into text. It is a versatile tool that can handle a wide range of inputs, including text, images, video, and audio, and supports both transcription and text-to-speech capabilities.

Supports a unknown token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Additional Information

Notes

Whisper is a transcription model that can transcribe and translate audio into text. According to the pricing information, it costs $0.006 per minute.

Capabilities

Text

Input Pricing

$0.006/ second

Video

Input Pricing

$0.0001/second

Audio

Input Pricing

$ 0.006 /minute

Generation Pricing

Not available

Transcription

Transcription Pricing

$0.006/minute

Text-to-Speech

Text-to-Speech Pricing

$0.01/1k characters

Embeddings

Embeddings Pricing

$0.0001/1k tokens

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop