Last updated: 17/03/2025

CohereOfficial Docs

C4AI Aya Vision 32B

c4ai-aya-vision-32b

active

## C4AI Aya Vision 32B A multimodal vision model based on the Aya research model, capable of handling a wide range of inputs and outputs including text, images, video, audio, transcription, and text-to-speech. Part of the Aya Expanse model family. Supports a 16,384 token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Additional Information

Notes

The model has a context length of 16384 tokens. According to the pricing information, Aya Expanse models (8B and 32B) on the API are charged at $0.50/1M Tokens for Input and $1.50/1M Tokens for Output.

Capabilities

Text

Input Pricing

$-/ KTok

Context: 16,384 tokens

Output Pricing

$-/ KTok

Vision Capabilities

Image

Input Pricing

$0.0001 /image

Embeddings

Embeddings Pricing

$0.0001/1k tokens

Fine-Tuning

Fine-Tuning Pricing

$3.00/MTok training