SvelteBench Visualization

Note: OpenAI thinking models (o3, o4) do not support temperature adjustments. o1-pro models use "medium" reasoning effort setting.

← Back to All Results

OpenAI

gpt-4.1-2025-04-14

Test Status pass@1 pass@10 Passing Samples Errors Actions
counter ⚠️ PARTIAL 0.7000 1.0000 7/10 3
derived ❌ FAIL 0.0000 0.0000 0/10 10
derived-by ❌ FAIL 0.0000 0.0000 0/10 10
each ⚠️ PARTIAL 0.2000 1.0000 2/10 9
effect ⚠️ PARTIAL 0.1000 1.0000 1/10 10
hello-world ✅ PASS 1.0000 1.0000 10/10 0
inspect ❌ FAIL 0.0000 0.0000 0/10 19
props ❌ FAIL 0.0000 0.0000 0/10 10
snippets ❌ FAIL 0.0000 0.0000 0/10 10

gpt-4.1-mini-2025-04-14

Test Status pass@1 pass@10 Passing Samples Errors Actions
counter ❌ FAIL 0.0000 0.0000 0/10 13
derived ❌ FAIL 0.0000 0.0000 0/10 13
derived-by ❌ FAIL 0.0000 0.0000 0/10 12
each ⚠️ PARTIAL 0.1000 1.0000 1/10 10
effect ❌ FAIL 0.0000 0.0000 0/10 16
hello-world ❌ FAIL 0.0000 0.0000 0/10 10
inspect ❌ FAIL 0.0000 0.0000 0/10 28
props ❌ FAIL 0.0000 0.0000 0/10 10
snippets ❌ FAIL 0.0000 0.0000 0/10 10

gpt-4.1-nano-2025-04-14

Test Status pass@1 pass@10 Passing Samples Errors Actions
counter ❌ FAIL 0.0000 0.0000 0/10 14
derived ❌ FAIL 0.0000 0.0000 0/10 13
derived-by ❌ FAIL 0.0000 0.0000 0/10 10
each ❌ FAIL 0.0000 0.0000 0/10 10
effect ❌ FAIL 0.0000 0.0000 0/10 13
hello-world ✅ PASS 1.0000 1.0000 10/10 0
inspect ❌ FAIL 0.0000 0.0000 0/10 10
props ❌ FAIL 0.0000 0.0000 0/10 10
snippets ❌ FAIL 0.0000 0.0000 0/10 10