Last updated: 16/04/2025

Google Vertex AIOfficial Docs

Gemini 2.0 Flash

gemini-2.0-flash

active

Gemini 2.0 Flash

Gemini 2.0 Flash is Google's latest multimodal AI model, offering enhanced capabilities for processing and generating text, images, audio, and video. With a 1M token context window, it is designed for low-latency, high-performance applications and agentic experiences.

Supports a 1M token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications. Supports tool use for advanced automation. Capable of generating structured output formats.

Additional Information

Notes

Gemini 2.0 Flash is Google's newest multimodal model with next generation features and improved capabilities. It can process text, images, audio, and video inputs and generate text outputs. The model has a 1M token context window and supports over 100 languages. It's designed for low latency and enhanced performance, making it suitable for agentic experiences. Pricing is token-based with $0.15 per 1M input tokens, $1.00 per 1M input audio tokens, and $0.60 per 1M output text tokens. Batch API pricing is available at 50% discount.

Model Timeline

Launch Date

2/5/2025

Capabilities

Text

Input Pricing

$0.15/ MTok

Context: 1,048,576 tokens

Output Pricing

$0.60/ MTok

Max tokens: 1,048,576

Vision Capabilities

Max resolution: 1920x1080
Max images per prompt: 16

Image

Input Pricing

1290 tokens/image

Video

Input Pricing

$0.000774/second

Audio

Input Pricing

$ 1.50 /minute

Generation Pricing

$6.00 /minute

Transcription

Transcription Pricing

$1.00/minute

Embeddings

Embeddings Pricing

$0.000025/1k tokens

Fine-Tuning

Fine-Tuning Pricing

$3.00/MTok training

Additional Model Information

Tool Use

Yes

Structured Output

Yes

Reasoning

Yes

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop