Last updated: 16/04/2025

Google Vertex AIOfficial Docs

Gemini 2.0 Flash (Image Generation) Experimental

gemini-2.0-flash-exp-image-generation

active

Gemini 2.0 Flash (Image Generation) Experimental

Gemini 2.0 Flash is an experimental multimodal AI model from Google, specialized for image generation capabilities. It supports a large 1048576 token context window and can handle a variety of inputs and outputs including text, image, video, audio, transcription, and text-to-speech.

Supports a 1048576 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications. Supports tool use for advanced automation. Capable of generating structured output formats.

Additional Information

Notes

This is an experimental version of Gemini 2.0 Flash specifically for image generation capabilities. It appears to be part of Google's Gemini 2.0 Flash model family, which supports multimodal inputs and outputs.

Capabilities

Text

Input Pricing

$0.15/ MTok

Context: 1,048,576 tokens

Output Pricing

$0.60/ MTok

Max tokens: 1,048,576

Vision Capabilities

Max resolution: 1024x1024
Max images per prompt: 1

Image

Input Pricing

1290 tokens/image

Generation Pricing

Per image pricing not available

Video

Input Pricing

$0.000774/second

Audio

Input Pricing

$ 1.50 /minute

Generation Pricing

$7.20 /minute

Transcription

Transcription Pricing

$1.00/minute

Embeddings

Embeddings Pricing

$0.0002/1k tokens

Fine-Tuning

Fine-Tuning Pricing

$3.00/MTok training

Additional Model Information

Tool Use

Yes

Structured Output

Yes

Reasoning

No

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop