SvelteBench Visualization

Note: Certain OpenAI thinking models (o3, o4) and gpt-5 do not support temperature adjustments (only default value of 1 is supported). Models with "-reasoning-" suffix (e.g., gpt-5-2025-08-07-reasoning-medium) will use the specified reasoning effort setting.

Errata: The "inspect" test has known correctness issues but is retained in the benchmark suite to maintain consistency and fairness in scoring across all evaluated models.

Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 60% 100% 6/10 4
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 0% 0% 0/10 10
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 10
props 10% 100% 1/10 9
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 60% 100% 6/10 4
each 90% 100% 9/10 1
effect 70% 100% 7/10 6
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 13
props 100% 100% 10/10 0
snippets 10% 100% 1/10 9
Test pass@1 pass@10 Passing Samples Errors Actions
counter 100% 100% 10/10 0
derived 100% 100% 10/10 0
derived-by 100% 100% 10/10 0
each 100% 100% 10/10 0
effect 100% 100% 10/10 0
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 13
props 100% 100% 10/10 0
snippets 50% 100% 5/10 5
Test pass@1 pass@10 Passing Samples Errors Actions
counter 0% 0% 0/10 13
derived 0% 0% 0/10 14
derived-by 40% 100% 4/10 14
each 0% 0% 0/10 11
effect 20% 100% 2/10 14
hello-world 100% 100% 10/10 0
inspect 0% 0% 0/10 13
props 0% 0% 0/10 10
snippets 0% 0% 0/10 14
Test pass@1 pass@10 Passing Samples Errors Actions
counter 30% 100% 3/10 13
derived 0% 0% 0/10 11
derived-by 10% 100% 1/10 9
each 10% 100% 1/10 9
effect 0% 0% 0/10 17
hello-world 90% 100% 9/10 1
inspect 0% 0% 0/10 10
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10
Test pass@1 pass@10 Passing Samples Errors Actions
counter 10% 100% 1/10 12
derived 0% 0% 0/10 12
derived-by 0% 0% 0/10 10
each 20% 100% 2/10 8
effect 0% 0% 0/10 11
hello-world 90% 100% 9/10 1
inspect 0% 0% 0/10 13
props 0% 0% 0/10 10
snippets 0% 0% 0/10 10