Qwen: Qwen3 VL 30B A3B Instruct

Released Oct 6, 2025

text+image→text

Try it on:

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

Best overall rank

Across all subsets · overall

Total arena votes

Across 0 (subset, category) entries

Providers on OR

None yet

First seen

Awaiting first snapshot

Trend

No snapshot history yet - trends appear after the first daily refresh.

Pricing

Per token

Input (per 1M): $0.130
Output (per 1M): $0.520

Capabilities

Modality

text+image->text

Context

262K tokens

Tokenizer

Qwen3

Knowledge cutoff

2025-03-31

Max output

33K tokens

Moderation

Unmoderated

Hugging Face

Qwen/Qwen3-VL-30B-A3B-Instruct

Supported parameters

frequency_penalty

logit_bias

logprobs

max_tokens

min_p

presence_penalty

repetition_penalty

response_format

seed

stop

structured_outputs

temperature

tool_choice

tools

top_k

top_logprobs

top_p

Providers

No provider endpoint data yet.

Per-category ranks

No leaderboard entries for this model.