Last updated: 16/04/2025

OpenAIOfficial Docs

GPT-4o Audio Preview

gpt-4o-audio-preview-2024-10-01

active

GPT-4o Audio Preview

The GPT-4o Audio Preview model is a versatile AI capable of handling a wide range of audio-related tasks, including transcription, text-to-speech, and more. With its high-intelligence capabilities, this model is well-suited for a variety of applications that require advanced audio processing.

Supports a wide token context window. Handles Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Model Timeline

Launch Date

10/1/2024

Capabilities

Image

Input Pricing

85 tokens/image

Video

Input Pricing

$0.01/second

Audio

Input Pricing

$ 15.00 /minute

Generation Pricing

$30.00 /minute

Transcription

Transcription Pricing

$0.006/minute

Text-to-Speech

Text-to-Speech Pricing

$0.01/1k characters

Embeddings

Embeddings Pricing

$0.02/1k tokens

Fine-Tuning

Fine-Tuning Pricing

$25.00/MTok training

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop