Qwen: Qwen3 VL 8B Instruct

Released Oct 14, 2025

image+text→text

Try it on:

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

Best overall rank

Across all subsets · overall

Total arena votes

Across 0 (subset, category) entries

Providers on OR

None yet

First seen

Awaiting first snapshot

Trend

No snapshot history yet - trends appear after the first daily refresh.

Pricing

Per token

Input (per 1M): $0.117
Output (per 1M): $0.455

Capabilities

Modality

text+image->text

Context

256K tokens

Tokenizer

Qwen3

Max output

33K tokens

Moderation

Unmoderated

Hugging Face

Qwen/Qwen3-VL-8B-Instruct

Supported parameters

frequency_penalty

logit_bias

logprobs

max_tokens

presence_penalty

repetition_penalty

response_format

seed

stop

structured_outputs

temperature

tool_choice

tools

top_k

top_logprobs

top_p

Providers

No provider endpoint data yet.

Per-category ranks

No leaderboard entries for this model.