SvelteBench Visualization

Top Models Leaderboard

Average pass@1 scores
Rank Model Score
1 anthropic/claude-opus-4.5 (OpenRouter)
100.0%
2 gpt-5.2 (OpenAI)
97.8%
3 google/gemini-3-flash-preview (OpenRouter)
96.7%
4 gemini-3-flash-preview (Google)
94.4%
5 claude-sonnet-4-5 (Anthropic)
93.3%
6 xiaomi/mimo-v2-flash:free (OpenRouter)
93.3%
7 kimi-k2-thinking-turbo (Moonshot)
92.2%
8 openrouter/sherlock-dash-alpha (OpenRouter)
92.2%
9 deepseek/deepseek-v3.2 (OpenRouter)
91.1%
10 google/gemini-3-pro-preview (OpenRouter)
91.1%
11 moonshotai/kimi-k2-0905 (OpenRouter)
90.0%
12 claude-sonnet-4-20250514 (Anthropic)
90.0%
13 kimi-k2-thinking (Moonshot)
88.9%
14 gpt-5-chat-latest (OpenAI)
88.9%
15 kwaipilot/kat-coder-pro:free (OpenRouter)
88.9%
16 mistralai/mistral-large-2512 (OpenRouter)
88.9%
17 qwen/qwen3-max (OpenRouter)
88.9%
18 deepseek/deepseek-v3.2-speciale (OpenRouter)
87.8%
19 deepseek/deepseek-v3.2-exp (OpenRouter)
86.7%
20 gemini-2.5-pro (Google)
85.6%
21 gpt-5 (OpenAI)
85.6%
22 z-ai/glm-4.6 (OpenRouter)
84.4%
23 claude-haiku-4-5-20251001 (Anthropic)
84.4%
24 gpt-5.1-chat-latest (OpenAI)
83.3%
25 openai/gpt-5.1-chat (OpenRouter)
83.3%
26 openai/gpt-5.1 (OpenRouter)
80.0%
27 openrouter/sherlock-think-alpha (OpenRouter)
80.0%
28 x-ai/grok-4.1-fast (OpenRouter)
80.0%
29 minimax/minimax-m2 (OpenRouter)
75.6%
30 openrouter/polaris-alpha (OpenRouter)
73.3%
31 gpt-5.1-codex-max (OpenAI)
67.8%
32 moonshotai/kimi-linear-48b-a3b-instruct (OpenRouter)
66.7%
33 amazon/nova-2-lite-v1:free (OpenRouter)
56.7%
34 prime-intellect/intellect-3 (OpenRouter)
55.6%
35 mistralai/devstral-2512:free (OpenRouter)
50.0%
36 gpt-5-codex (OpenAI)
47.8%
37 openai/gpt-5.1-codex (OpenRouter)
45.6%
38 essentialai/rnj-1-instruct (OpenRouter)
30.0%
39 gpt-5-mini (OpenAI)
23.3%
40 mistralai/ministral-3b-2512 (OpenRouter)
22.2%
41 nvidia/nemotron-3-nano-30b-a3b:free (OpenRouter)
18.9%
42 openai/gpt-5.1-codex-mini (OpenRouter)
16.7%
43 gpt-5-nano (OpenAI)
13.3%
44 qwen/qwen3-vl-8b-instruct (OpenRouter)
11.1%
45 deepcogito/cogito-v2-preview-llama-405b (OpenRouter)
10.0%
46 mistralai/ministral-8b-2512 (OpenRouter)
6.7%
47 mistralai/ministral-14b-2512 (OpenRouter)
3.3%
48 allenai/olmo-3.1-32b-think:free (OpenRouter)
0.0%

Note: Certain OpenAI thinking models (o3, o4) and gpt-5 do not support temperature adjustments (only default value of 1 is supported). Models with "-reasoning-" suffix (e.g., gpt-5-2025-08-07-reasoning-medium) will use the specified reasoning effort setting.

Errata: The "inspect" test has known correctness issues but is retained in the benchmark suite to maintain consistency and fairness in scoring across all evaluated models.

Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 60% 100% 6/10 16
props 100% 100% 10/10 0
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 100% 100% 10/10 0
snippets 90% 100% 9/10 1
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 40% 100% 4/10 6
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 100% 100% 10/10 0
snippets 70% 100% 7/10 3
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 90% 100% 9/10 1
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 60% 100% 6/10 4
props 100% 100% 9/9 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 90% 100% 9/10 1
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 9
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 30% 100% 3/10 7
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 90% 100% 9/10 1
derived-by 90% 100% 9/10 1
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 18
props 100% 100% 10/10 0
snippets 90% 100% 9/10 1
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 100% 100% 10/10 0
props 100% 100% 10/10 0
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 60% 100% 6/10 10
derived 40% 100% 4/10 6
derived-by 30% 100% 3/10 7
each 70% 100% 7/10 3
effect 20% 100% 2/10 9
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 40% 100% 4/10 6
snippets 70% 100% 7/10 3
Test pass@1 pass@10 Passing Samples Errors Actions
counter 50% 100% 5/10 5
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 50% 100% 5/10 5
effect 0% 0% 0/10 11
hello-world 90% 100% 9/10 1
inspect 20% 100% 2/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 20% 100% 2/10 8
derived 0% 0% 0/10 13
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 11
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 12
props 0% 0% 0/10 10
snippets 0% 0% 0/10 18
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 70% 100% 7/10 6
derived-by 90% 100% 9/10 1
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 80% 100% 8/10 5
props 90% 100% 9/10 1
snippets 20% 100% 2/10 8
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 70% 100% 7/10 3
derived-by 40% 100% 4/10 8
each 90% 100% 9/10 1
effect 90% 100% 9/10 2
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 16
props 60% 100% 6/10 4
snippets 60% 100% 6/10 4
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 90% 100% 9/10 2
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 90% 100% 9/10 1
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 10
hello-world 0% 0% 0/10 10
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 70% 100% 7/10 3
derived 80% 100% 8/10 2
derived-by 70% 100% 7/10 3
each 70% 100% 7/10 4
effect 60% 100% 6/10 4
hello-world 100% 100% 10/10 0
inspect 30% 100% 3/10 7
props 30% 100% 3/10 7
snippets 0% 0% 0/10 12
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 100% 100% 10/10 0
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 11
hello-world 90% 100% 9/10 1
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 12
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 30% 100% 3/10 7
props 100% 100% 10/10 0
snippets 90% 100% 9/10 1
Test pass@1 pass@10 Passing Samples Errors Actions
counter 90% 100% 9/10 1
derived 80% 100% 8/10 2
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 100% 100% 10/10 0
snippets 90% 100% 9/10 3
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 100% 100% 10/10 0
snippets 70% 100% 7/10 3
Test pass@1 pass@10 Passing Samples Errors Actions
counter 40% 100% 4/10 9
derived 50% 100% 5/10 7
derived-by 40% 100% 4/10 8
each 20% 100% 2/10 8
effect 10% 100% 1/10 10
hello-world 60% 100% 6/10 6
inspect 30% 100% 3/10 7
props 0% 0% 0/10 13
snippets 20% 100% 2/10 14
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 90% 100% 9/10 1
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 80% 100% 8/10 2
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 20% 100% 2/10 8
props 100% 100% 10/10 0
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 90% 100% 9/10 1
props 70% 100% 7/10 3
snippets 40% 100% 4/10 8
Test pass@1 pass@10 Passing Samples Errors Actions
counter 80% 100% 8/10 8
derived 80% 100% 8/10 4
derived-by 100% 100% 10/10 0
each 80% 100% 8/10 3
effect 100% 100% 10/10 0
hello-world 100% 100% 9/9 0
inspect 30% 100% 3/10 10
props 70% 100% 7/10 6
snippets 40% 100% 4/10 8
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 0% 0% 0/10 10
effect 100% 100% 10/10 0
hello-world 40% 100% 4/10 6
inspect 10% 100% 1/10 9
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 30% 100% 3/10 7
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 10
hello-world 0% 0% 0/10 10
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 12
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 14
each 0% 0% 0/10 10
effect 0% 0% 0/10 11
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 15
props 0% 0% 0/10 22
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 10
hello-world 60% 100% 6/10 5
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 100% 100% 10/10 0
props 100% 100% 10/10 0
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 90% 100% 9/10 1
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 40% 100% 4/10 6
props 80% 100% 8/10 3
snippets 100% 100% 10/10 0
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 70% 100% 7/10 3
each 30% 100% 3/10 7
effect 50% 100% 5/10 6
hello-world 90% 100% 9/10 1
inspect 60% 100% 6/10 4
props 60% 100% 6/10 5
snippets 40% 100% 4/10 7
Test pass@1 pass@10 Passing Samples Errors Actions
counter 60% 100% 6/10 4
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 20
each 20% 100% 2/10 8
effect 0% 0% 0/10 20
hello-world 80% 100% 8/10 2
inspect 10% 100% 1/10 20
props 0% 0% 0/10 11
snippets 0% 0% 0/10 22
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 80% 100% 8/10 2
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 80% 100% 8/10 2
inspect 30% 100% 3/10 15
props 90% 100% 9/10 1
snippets 40% 100% 4/10 6
Test pass@1 pass@10 Passing Samples Errors Actions
counter 90% 100% 9/10 1
derived 100% 100% 10/10 0
derived-by 80% 100% 8/10 4
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 80% 100% 8/10 5
props 90% 100% 9/10 1
snippets 10% 100% 1/10 9
Test pass@1 pass@10 Passing Samples Errors Actions
counter 80% 100% 8/10 2
derived 30% 100% 3/10 7
derived-by 10% 100% 1/10 9
each 70% 100% 7/10 3
effect 20% 100% 2/10 8
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 50% 100% 5/10 10
snippets 50% 100% 5/10 5
Test pass@1 pass@10 Passing Samples Errors Actions
counter 30% 100% 3/10 7
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 14
each 50% 100% 5/10 6
effect 20% 100% 2/10 12
hello-world 50% 100% 5/10 5
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 90% 100% 9/10 1
derived-by 90% 100% 9/10 1
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 13
props 80% 100% 8/10 5
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 80% 100% 8/10 8
props 100% 100% 10/10 0
snippets 50% 100% 5/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 70% 100% 7/10 7
each 100% 100% 10/10 0
effect 90% 100% 9/10 2
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 34
props 100% 100% 10/10 0
snippets 60% 100% 6/10 4
Test pass@1 pass@10 Passing Samples Errors Actions
counter 70% 100% 7/10 6
derived 90% 100% 9/10 1
derived-by 40% 100% 4/10 10
each 100% 100% 10/10 0
effect 20% 100% 2/10 13
hello-world 100% 100% 10/10 0
inspect 10% 100% 1/10 27
props 50% 100% 5/10 5
snippets 20% 100% 2/10 14
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 9/9 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 90% 100% 9/10 1
props 100% 100% 10/10 0
snippets 10% 100% 1/10 9
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 10
derived 0% 0% 0/10 10
derived-by 0% 0% 0/10 10
each 0% 0% 0/10 10
effect 0% 0% 0/10 10
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 60% 100% 6/10 7
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 31
props 90% 100% 9/10 2
snippets 70% 100% 7/10 3
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 70% 100% 7/10 3
props 100% 100% 10/10 0
snippets 70% 100% 7/10 7
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 40% 100% 4/10 12
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 30% 100% 3/10 7
props 100% 100% 10/10 0
snippets 90% 100% 9/10 1