Last updated: 16/04/2025

OpenAIOfficial Docs

GPT-4o Transcribe

gpt-4o-transcribe

active

GPT-4o Transcribe

A powerful speech-to-text model powered by GPT-4o, capable of transcribing and translating audio into text with high accuracy.

Supports a wide range of inputs and outputs, including Text, Image, Video, Audio, Transcription, and Text-to-Speech. Specialized for general-purpose transcription and translation tasks. Supports fine-tuning for custom applications.

Pricing:

  • Text tokens: $2.50 per 1M input tokens, $10.00 per 1M output tokens
  • Audio tokens: $6.00 per 1M input tokens
  • Estimated cost: $0.006 per minute

Additional Information

Notes

Pricing: $2.50 per 1M text tokens for input, $10.00 per 1M text tokens for output. For audio tokens: $6.00 per 1M tokens for input. Estimated cost: $0.006 per minute.

Capabilities

Text

Input Pricing

$2.50/ MTok

Context: N/A tokens

Output Pricing

$10.00/ MTok

Video

Input Pricing

$0.006/second

Audio

Input Pricing

$ 0.006 /minute

Generation Pricing

Not available

Transcription

Transcription Pricing

$0.006/minute

Text-to-Speech

Text-to-Speech Pricing

$0.01/1k characters

Embeddings

Embeddings Pricing

$0.0001/1k tokens

Fine-Tuning

Fine-Tuning Pricing

$25.00/MTok training

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop