SvelteBench Visualization

Top Models Leaderboard

Average pass@1 scores
Rank Model Score
1 claude-sonnet-4-5 (Anthropic)
93.3%
2 kimi-k2-thinking-turbo (Moonshot)
92.2%
3 moonshotai/kimi-k2-0905 (OpenRouter)
90.0%
4 claude-sonnet-4-20250514 (Anthropic)
90.0%
5 kimi-k2-thinking (Moonshot)
88.9%
6 qwen/qwen3-max (OpenRouter)
88.9%
7 deepseek/deepseek-v3.2-exp (OpenRouter)
86.7%
8 gemini-2.5-pro (Google)
85.6%
9 z-ai/glm-4.6 (OpenRouter)
84.4%
10 claude-haiku-4-5-20251001 (Anthropic)
84.4%
11 gpt-5 (OpenAI)
83.3%
12 minimax/minimax-m2 (OpenRouter)
75.6%
13 openrouter/polaris-alpha (OpenRouter)
73.3%
14 qwen/qwen3-vl-8b-instruct (OpenRouter)
11.1%
15 deepcogito/cogito-v2-preview-llama-405b (OpenRouter)
10.0%

Note: Certain OpenAI thinking models (o3, o4) and gpt-5 do not support temperature adjustments (only default value of 1 is supported). Models with "-reasoning-" suffix (e.g., gpt-5-2025-08-07-reasoning-medium) will use the specified reasoning effort setting.

Errata: The "inspect" test has known correctness issues but is retained in the benchmark suite to maintain consistency and fairness in scoring across all evaluated models.

Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 60% 100% 6/10 16
props 100% 100% 10/10 0
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 100% 100% 10/10 0
snippets 90% 100% 9/10 1
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 40% 100% 4/10 6
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 100% 100% 10/10 0
snippets 70% 100% 7/10 3
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 90% 100% 9/10 1
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 9
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 30% 100% 3/10 7
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 90% 100% 9/10 1
derived-by 90% 100% 9/10 3
each 100% 100% 10/10 0
effect 90% 100% 9/10 1
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 25
props 100% 100% 10/10 0
snippets 80% 100% 8/10 4
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 11
hello-world 90% 100% 9/10 1
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 12
Test pass@1 pass@10 Passing Samples Errors Actions
counter 90% 100% 9/10 1
derived 80% 100% 8/10 2
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 100% 100% 10/10 0
snippets 90% 100% 9/10 3
Test pass@1 pass@10 Passing Samples Errors Actions
counter 80% 100% 8/10 8
derived 80% 100% 8/10 4
derived-by 100% 100% 10/10 0
each 80% 100% 8/10 3
effect 100% 100% 10/10 0
hello-world 100% 100% 9/9 0
inspect 30% 100% 3/10 10
props 70% 100% 7/10 6
snippets 40% 100% 4/10 8
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 90% 100% 9/10 1
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 40% 100% 4/10 6
props 80% 100% 8/10 3
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 90% 100% 9/10 1
derived-by 90% 100% 9/10 1
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 13
props 80% 100% 8/10 5
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 9/9 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 90% 100% 9/10 1
props 100% 100% 10/10 0
snippets 10% 100% 1/10 9
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 10
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 40% 100% 4/10 12
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 30% 100% 3/10 7
props 100% 100% 10/10 0
snippets 90% 100% 9/10 1