SvelteBench Visualization

Top Models Leaderboard

Average pass@1 scores
Rank Model Score
1 claude-sonnet-4-20250514 (Anthropic)
90.0%
2 claude-sonnet-4-5 (Anthropic)
90.0%
3 claude-opus-4-1-20250805 (Anthropic)
88.9%
4 moonshotai/kimi-k2-0905 (OpenRouter)
88.9%
5 glm-4.5 (Z.ai)
86.7%
6 kimi-k2-turbo-preview (Moonshot AI)
85.6%
7 qwen/qwen3-max (OpenRouter)
85.6%
8 x-ai/grok-4 (OpenRouter)
84.4%
9 gemini-2.5-pro (Google)
83.3%
10 qwen/qwen3-coder-plus (OpenRouter)
83.3%
11 x-ai/grok-code-fast-1 (OpenRouter)
83.3%
12 openrouter/sonoma-sky-alpha (OpenRouter)
81.1%
13 gpt-5 (OpenAI)
81.1%
14 x-ai/grok-4-fast:free (OpenRouter)
80.0%
15 bytedance/seed-oss-36b-instruct (OpenRouter)
78.9%
16 @preset/alibaba-qwen3-vl-235b-a22b-instruct (OpenRouter)
77.8%
17 openrouter/sonoma-dusk-alpha (OpenRouter)
77.8%
18 deepseek/deepseek-v3.1-terminus (OpenRouter)
76.7%
19 gemini-2.5-flash (Google)
74.4%
20 qwen/qwen3-coder (OpenRouter)
73.3%
21 deepseek/deepseek-chat-v3.1 (OpenRouter)
70.0%
22 gemini-2.5-flash-lite-preview-09-2025 (Google)
67.8%
23 qwen/qwen3-235b-a22b-thinking-2507 (OpenRouter)
67.8%
24 gemini-2.5-flash-preview-09-2025 (Google)
66.7%
25 gemini-2.5-flash-lite (Google)
60.0%
26 qwen/qwen-plus-2025-07-28 (OpenRouter)
57.8%
27 gpt-5-codex (OpenAI)
57.8%
28 openai/gpt-5-codex (OpenRouter)
54.4%
29 meituan/longcat-flash-chat (OpenRouter)
52.2%
30 qwen/qwen3-next-80b-a3b-thinking (OpenRouter)
48.9%
31 glm-4-32b-0414-128k (Z.ai)
44.4%
32 qwen/qwen3-coder-flash (OpenRouter)
36.7%
33 qwen/qwen3-30b-a3b (OpenRouter)
26.7%
34 qwen/qwen3-next-80b-a3b-instruct (OpenRouter)
25.6%
35 o3 (OpenAI)
22.2%
36 gpt-5-mini (OpenAI)
20.0%
37 gpt-5-nano (OpenAI)
18.9%
38 o4-mini (OpenAI)
15.6%
39 opengvlab/internvl3-78b (OpenRouter)
15.6%
40 qwen/qwq-32b (OpenRouter)
14.4%
41 alibaba/tongyi-deepresearch-30b-a3b (OpenRouter)
12.2%
42 qwen/qwen-turbo (OpenRouter)
8.9%
43 arcee-ai/afm-4.5b (OpenRouter)
5.6%
44 nvidia/nemotron-nano-9b-v2 (OpenRouter)
0.0%

Note: Certain OpenAI thinking models (o3, o4) and gpt-5 do not support temperature adjustments (only default value of 1 is supported). Models with "-reasoning-" suffix (e.g., gpt-5-2025-08-07-reasoning-medium) will use the specified reasoning effort setting.

Errata: The "inspect" test has known correctness issues but is retained in the benchmark suite to maintain consistency and fairness in scoring across all evaluated models.

Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 100% 100% 10/10 0
snippets 90% 100% 9/10 1
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 9
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 90% 100% 9/10 1
derived-by 90% 100% 9/10 1
each 100% 100% 10/10 0
effect 80% 100% 8/10 4
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 9
props 100% 100% 10/10 0
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 80% 100% 8/10 2
derived 80% 100% 8/10 4
derived-by 100% 100% 10/10 0
each 10% 100% 1/10 9
effect 80% 100% 8/10 4
hello-world 100% 100% 10/10 0
inspect 70% 100% 7/10 3
props 20% 100% 2/10 8
snippets 0% 0% 0/10 14
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 90% 100% 9/10 3
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 20% 100% 2/10 8
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 80% 100% 8/10 2
each 30% 100% 3/10 7
effect 70% 100% 7/10 3
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 90% 100% 9/10 1
snippets 30% 100% 3/10 7
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 100% 100% 10/10 0
snippets 50% 100% 5/10 5
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 90% 100% 9/10 1
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 80% 100% 8/10 2
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 80% 100% 8/10 3
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 60% 100% 6/10 4
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 15
props 80% 100% 8/10 2
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 80% 100% 8/10 2
derived 50% 100% 5/10 6
derived-by 50% 100% 5/10 5
each 70% 100% 7/10 3
effect 50% 100% 5/10 5
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 13
props 60% 100% 6/10 4
snippets 60% 100% 6/10 4
Test pass@1 pass@10 Passing Samples Errors Actions
counter 40% 100% 4/10 6
derived 0% 0% 0/10 11
derived-by 0% 0% 0/10 10
each 30% 100% 3/10 8
effect 20% 100% 2/10 8
hello-world 90% 100% 9/10 1
inspect 0% 0% 0/10 13
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 50% 100% 5/10 11
derived 10% 100% 1/10 11
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 12
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 9
props 0% 0% 0/10 10
snippets 0% 0% 0/10 18
Test pass@1 pass@10 Passing Samples Errors Actions
counter 10% 100% 1/10 9
derived 0% 0% 0/10 15
derived-by 0% 0% 0/10 10
each 60% 100% 6/10 4
effect 0% 0% 0/10 12
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 9
props 20% 100% 2/10 8
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 10% 100% 1/10 9
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 10% 100% 1/10 10
effect 20% 100% 2/10 9
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 10% 100% 1/10 9
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 90% 100% 9/10 1
props 100% 100% 10/10 0
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 30% 100% 3/10 7
effect 0% 0% 0/10 10
hello-world 80% 100% 8/10 2
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 22
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 11
each 0% 0% 0/10 13
effect 0% 0% 0/10 10
hello-world 50% 100% 5/10 6
inspect 0% 0% 0/10 16
props 0% 0% 0/10 13
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 90% 100% 9/10 1
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 90% 100% 9/10 1
effect 40% 100% 4/10 8
hello-world 100% 100% 10/10 0
inspect 100% 100% 10/10 0
props 90% 100% 9/10 1
snippets 0% 0% 0/10 14
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 20% 100% 2/10 10
derived-by 90% 100% 9/10 1
each 40% 100% 4/10 11
effect 90% 100% 9/10 2
hello-world 100% 100% 10/10 0
inspect 30% 100% 3/10 7
props 100% 100% 10/10 0
snippets 60% 100% 6/10 4
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 80% 100% 8/10 3
derived-by 90% 100% 9/10 3
each 40% 100% 4/10 12
effect 100% 100% 10/10 0
hello-world 90% 100% 9/10 1
inspect 30% 100% 3/10 7
props 90% 100% 9/10 1
snippets 70% 100% 7/10 3
Test pass@1 pass@10 Passing Samples Errors Actions
counter 20% 100% 2/10 8
derived 100% 100% 10/10 0
derived-by 10% 100% 1/10 9
each 10% 100% 1/10 9
effect 50% 100% 5/10 5
hello-world 100% 100% 10/10 0
inspect 70% 100% 7/10 3
props 90% 100% 9/10 1
snippets 20% 100% 2/10 8
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 90% 100% 9/10 1
each 80% 100% 8/10 3
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 40% 100% 4/10 6
props 90% 100% 9/10 1
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 10
hello-world 0% 0% 0/10 10
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 80% 100% 8/10 2
derived 60% 100% 6/10 5
derived-by 20% 100% 2/10 8
each 70% 100% 7/10 3
effect 70% 100% 7/10 3
hello-world 90% 100% 9/10 1
inspect 0% 0% 0/10 13
props 50% 100% 5/10 11
snippets 50% 100% 5/10 5
Test pass@1 pass@10 Passing Samples Errors Actions
counter 40% 100% 4/10 6
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 10% 100% 1/10 9
hello-world 90% 100% 9/10 1
inspect 0% 0% 0/10 13
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 30% 100% 3/10 7
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 12
props 100% 100% 10/10 0
snippets 60% 100% 6/10 4
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 70% 100% 7/10 5
each 100% 100% 10/10 0
effect 70% 100% 7/10 6
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 29
props 80% 100% 8/10 2
snippets 90% 100% 9/10 1
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 80% 100% 8/10 2
derived-by 70% 100% 7/10 5
each 90% 100% 9/10 1
effect 50% 100% 5/10 5
hello-world 80% 100% 8/10 2
inspect 40% 100% 4/10 6
props 10% 100% 1/10 12
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 10% 100% 1/10 10
hello-world 70% 100% 7/10 3
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 80% 100% 8/10 5
derived 60% 100% 6/10 5
derived-by 80% 100% 8/10 2
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 70% 100% 7/10 3
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 60% 100% 6/10 7
derived 20% 100% 2/10 8
derived-by 10% 100% 1/10 9
each 50% 100% 5/10 5
effect 0% 0% 0/10 11
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 13
props 0% 0% 0/10 10
snippets 0% 0% 0/10 16
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 30% 100% 3/10 15
each 100% 100% 10/10 0
effect 90% 100% 9/10 1
hello-world 100% 100% 10/10 0
inspect 50% 100% 5/10 17
props 90% 100% 9/10 1
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 0% 0% 0/10 10
derived-by 40% 100% 4/10 6
each 100% 100% 10/10 0
effect 0% 0% 0/10 10
hello-world 80% 100% 8/10 3
inspect 0% 0% 0/10 10
props 10% 100% 1/10 9
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 70% 100% 7/10 5
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 80% 100% 8/10 5
props 100% 100% 10/10 0
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 100% 100% 10/10 0
props 70% 100% 7/10 3
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 50% 100% 5/10 5
derived 60% 100% 6/10 4
derived-by 10% 100% 1/10 13
each 0% 0% 0/10 10
effect 0% 0% 0/10 14
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 10% 100% 1/10 9
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 40% 100% 4/10 18
derived 100% 100% 10/10 0
derived-by 60% 100% 6/10 9
each 80% 100% 8/10 2
effect 20% 100% 2/10 13
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 10% 100% 1/10 9
snippets 10% 100% 1/10 11
Test pass@1 pass@10 Passing Samples Errors Actions
counter 40% 100% 4/10 9
derived 0% 0% 0/10 13
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 11
effect 0% 0% 0/10 13
hello-world 90% 100% 9/10 1
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 70% 100% 7/10 9
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 100% 100% 10/10 0
snippets 90% 100% 9/10 1
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 90% 100% 9/10 1
each 100% 100% 10/10 0
effect 80% 100% 8/10 3
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 16
props 100% 100% 10/10 0
snippets 50% 100% 5/10 5
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 80% 100% 8/10 4
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 90% 100% 9/10 1
inspect 30% 100% 3/10 10
props 60% 100% 6/10 4
snippets 90% 100% 9/10 1
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 60% 100% 6/10 6
each 0% 0% 0/10 10
effect 10% 100% 1/10 12
hello-world 90% 100% 9/10 1
inspect 20% 100% 2/10 8
props 20% 100% 2/10 8
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 80% 100% 8/10 4
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 40% 100% 4/10 6
props 100% 100% 10/10 0
snippets 60% 100% 6/10 4