OpenAI Official Docs

Whisper

whisper-1

active

Whisper

Whisper is a powerful, general-purpose speech recognition model that can transcribe and translate audio into text. It is a versatile tool that can handle a wide range of inputs, including text, images, video, and audio, and supports both transcription and text-to-speech capabilities.

Supports a unknown token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Additional Information

Notes

Whisper is a transcription model that can transcribe and translate audio into text. According to the pricing information, it costs $0.006 per minute.

Capabilities

Text

Input Pricing

$0.006/ second

Video

Input Pricing

$0.0001/second

Audio

Input Pricing

$ 0.006 /minute

Generation Pricing

Not available

Transcription

Transcription Pricing

$0.006/minute

Text-to-Speech

Text-to-Speech Pricing

$0.01/1k characters

Embeddings

Embeddings Pricing

$0.0001/1k tokens

OpenAI Official Docs

Whisper

Whisper

Additional Information

Notes

Capabilities

Text

Input Pricing

Video

Input Pricing

Audio

Input Pricing

Generation Pricing

Transcription

Transcription Pricing

Text-to-Speech

Text-to-Speech Pricing

Embeddings

Embeddings Pricing

Anthropic

Cohere

DeepSeek

Google Vertex AI

Groq

Mistral

OpenAI

X.AI

Additional Information

Notes

Capabilities

Text

Input Pricing

Video

Input Pricing

Audio

Input Pricing

Generation Pricing

Transcription

Transcription Pricing

Text-to-Speech

Text-to-Speech Pricing

Embeddings

Embeddings Pricing

Flatten your repo for AI in seconds

Anthropic

Cohere

DeepSeek

Google Vertex AI

Groq

Mistral

OpenAI

X.AI