Last updated: 16/04/2025

OpenAIOfficial Docs

GPT-4o mini Audio Preview

gpt-4o-mini-audio-preview-2024-12-17

active

GPT-4o mini Audio Preview

A smaller version of the GPT-4o model, capable of processing both text and audio inputs to generate text, audio, and other media outputs.

Supports a 128,000 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Additional Information

Notes

This model can process both text and audio inputs and generate both text and audio outputs. Pricing is $0.15 per 1M tokens for text input, $0.60 per 1M tokens for text output, $10.00 per 1M tokens for audio input, and $20.00 per 1M tokens for audio output.

Model Timeline

Launch Date

12/17/2024

Capabilities

Text

Input Pricing

$0.50/ MTok

Context: 128,000 tokens

Output Pricing

$1.50/ MTok

Max tokens: 4,096

Vision Capabilities

Max resolution: 2048x2048
Max images per prompt: 10

Audio

Input Pricing

$ 10.00 /minute

Generation Pricing

$20.00 /minute

Transcription

Transcription Pricing

$0.003/minute

Text-to-Speech

Text-to-Speech Pricing

$0.01/1k characters

Embeddings

Embeddings Pricing

$0.02/1k tokens

Fine-Tuning

Fine-Tuning Pricing

$3.00/MTok training

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop