Last updated: 16/04/2025

GroqOfficial Docs

Llama 3.1 8B Instant

llama-3.1-8b-instant

active

Llama 3.1 8B Instant

Meta's Llama 3.1 8B Instant model offers high-speed inference at 750 tokens per second, making it a powerful and efficient choice for a wide range of natural language processing tasks. With a 128K token context window, this model can handle long-form content and complex inputs.

Supports a 128K token context window. Handles Text, Image, Video, Audio, Transcription, Text-to-Speech inputs and outputs. Supports fine-tuning for custom applications.

Capabilities

Text

Input Pricing

$0.05/ MTok

Context: 131,072 tokens

Output Pricing

$0.08/ MTok

Max tokens: 8,192

Image

Input Pricing

6400 tokens/image

Text-to-Speech

Text-to-Speech Pricing

$0.05/1k characters

Embeddings

Embeddings Pricing

$0.08/1k tokens

Additional Model Information

Tool Use

No

Structured Output

No

Reasoning

Yes

Flatten your repo for AI in seconds

Flatten repos. Prompt faster. One click → one GPT-ready file

Free Online & Desktop