Last updated: 16/04/2025

OpenAIOfficial Docs

GPT-4o Audio Preview

gpt-4o-audio-preview-2024-12-17

active

GPT-4o Audio Preview

The GPT-4o Audio Preview model is a powerful AI assistant capable of handling a wide range of inputs and outputs, including text, images, video, audio, transcription, and text-to-speech. This model is designed to provide an enhanced audio experience, making it a valuable tool for applications that require advanced audio capabilities.

Supports a wide token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Model Timeline

Launch Date

12/17/2024

Capabilities

Text

Input Pricing

$2.50/ MTok

Context: N/A tokens

Output Pricing

$10.00/ MTok

Audio

Input Pricing

$ 0.07 /minute

Generation Pricing

$0.13 /minute

Transcription

Transcription Pricing

$0.006/minute

Text-to-Speech

Text-to-Speech Pricing

$0.01/1k characters

Embeddings

Embeddings Pricing

$0.0001/1k tokens

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop