Last updated: 16/04/2025

MistralOfficial Docs

Pixtral 12B

pixtral-12b-latest

active

Pixtral 12B

Pixtral 12B is a frontier-class multimodal model from Mistral Research, capable of understanding and generating text, images, video, audio, and more. With a 131K token context window, it excels at a wide range of tasks including text generation, image captioning, video understanding, and audio transcription.

Supports Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications. Supports tool use for advanced automation. Capable of generating structured output formats.

Pixtral 12B is available under the Mistral Research License and can be accessed via the Mistral API. Learn more in the Pixtral 12B blog post.

Model Timeline

Launch Date

9/1/2024

Capabilities

Text

Input Pricing

$0.70/ MTok

Context: 131,072 tokens

Output Pricing

$0.70/ MTok

Max tokens: 4,096

Vision Capabilities

Max resolution: 2048x2048
Max images per prompt: 16

Image

Input Pricing

85 tokens/image

Embeddings

Embeddings Pricing

$0.10/1k tokens

Additional Model Information

Tool Use

Yes

Structured Output

Yes

Reasoning

Yes

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop