Last updated: 16/04/2025

OpenAIOfficial Docs

GPT-4o Audio

gpt-4o-audio-preview

active

GPT-4o Audio

GPT-4o models capable of handling both text and audio inputs and outputs, making them a versatile choice for a wide range of applications.

Supports a 128,000 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Additional Information

Notes

This model can handle both text and audio inputs and outputs. The pricing shows different rates for text tokens ($2.50/1M input tokens, $10.00/1M output tokens) and audio tokens ($40.00/1M input tokens, $80.00/1M output tokens).

Model Timeline

Launch Date

12/17/2024

Capabilities

Text

Input Pricing

$2.50/ MTok

Context: 128,000 tokens

Output Pricing

$10.00/ MTok

Max tokens: 4,096

Vision Capabilities

Video

Input Pricing

$0.0015/second

Audio

Input Pricing

$ 15.00 /minute

Generation Pricing

$30.00 /minute

Transcription

Transcription Pricing

$40.00/minute

Text-to-Speech

Text-to-Speech Pricing

$0.04/1k characters

Embeddings

Embeddings Pricing

$0.13/1k tokens

Fine-Tuning

Fine-Tuning Pricing

$25.00/MTok training

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop