Last updated: 16/04/2025

X.AIOfficial Docs

Grok Vision Beta

grok-vision-beta

active

Grok Vision Beta

Grok Vision Beta is a powerful multimodal AI model from X AI that can process and analyze a wide range of media, including text, images, video, and audio. With a large 128,000 token context window, this model is capable of handling complex, multi-faceted inputs and delivering high-quality outputs across a variety of applications.

Supports a 128,000 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Capabilities

Text

Input Pricing

$0.00/ KTok

Context: 128,000 tokens

Output Pricing

$0.00/ KTok

Max tokens: 4,096

Vision Capabilities

Max resolution: 1080p
Max images per prompt: 5

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop