StepFun: Step 3.7 Flash

Released May 28, 2026

text+image+video→text

Try it on:

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...

Best overall rank

Across all subsets · overall

Total arena votes

Across 0 (subset, category) entries

Providers on OR

None yet

First seen

Awaiting first snapshot

Trend

No snapshot history yet - trends appear after the first daily refresh.

Pricing

Per token

Input (per 1M): $0.200
Output (per 1M): $1.15
Cache Read (per 1M): $0.040

Capabilities

Modality

text+image+video->text

Context

256K tokens

Tokenizer

Other

Max output

256K tokens

Moderation

Unmoderated

Hugging Face

stepfun-ai/Step-3.7-Flash

Supported parameters

frequency_penalty

include_reasoning

logit_bias

logprobs

max_tokens

min_p

presence_penalty

reasoning

reasoning_effort

repetition_penalty

response_format

seed

stop

structured_outputs

temperature

tool_choice

tools

top_k

top_logprobs

top_p

Providers

No provider endpoint data yet.

Per-category ranks

No leaderboard entries for this model.