Last updated: 17/03/2025

MistralOfficial Docs

Mistral OCR

mistral-ocr-2503

active

## Mistral OCR Introducing the world's best document understanding API. Mistral OCR is a powerful model that can handle a wide range of inputs, including text, images, video, and audio, and provides capabilities like transcription and text-to-speech. Supports a 32768 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Model Timeline

Launch Date

3/1/2025

Capabilities

Text

Input Pricing

$0.70/ MTok

Context: 32,768 tokens

Output Pricing

$0.70/ MTok

Max tokens: 4,096

Vision Capabilities

Max resolution: 4096x4096
Max images per prompt: 10

Image

Input Pricing

85 tokens/image

Embeddings

Embeddings Pricing

$0.0001/1k tokens