Meta: Llama 3.2 11B Vision Instruct

Released Sep 25, 2024

meta-llama/llama-3.2-11b-vision-instruct

text+image→text

Set alert

Try it on:

OpenRouter

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Best overall rank

Across all subsets · overall

Total arena votes

Across 0 (subset, category) entries

Providers on OR

None yet

First seen

Awaiting first snapshot

Trend

No snapshot history yet - trends appear after the first daily refresh.

Pricing

Per token

Input (per 1M): $0.245
Output (per 1M): $0.245

Capabilities

Modality

text+image->text

Context

131K tokens

Tokenizer

Llama3

Knowledge cutoff

2023-12-31

Max output

16K tokens

Moderation

Unmoderated

Hugging Face

meta-llama/Llama-3.2-11B-Vision-Instruct

Supported parameters

frequency_penalty

logit_bias

max_tokens

min_p

presence_penalty

repetition_penalty

response_format

seed

stop

temperature

top_k

top_p

Providers

No provider endpoint data yet.

Per-category ranks

No leaderboard entries for this model.