SvelteBench Visualization

Note: OpenAI thinking models (o3, o4) do not support temperature adjustments. o1-pro models use "medium" reasoning effort setting.

← Back to All Results

Anthropic

claude-sonnet-4-20250514

Test Status pass@1 pass@10 Passing Samples Errors Actions
counter ✅ PASS 1.0000 1.0000 10/10 0
derived ✅ PASS 1.0000 1.0000 10/10 0
derived-by ✅ PASS 1.0000 1.0000 10/10 0
each ✅ PASS 1.0000 1.0000 10/10 0
effect ✅ PASS 1.0000 1.0000 10/10 0
hello-world ✅ PASS 1.0000 1.0000 10/10 0
inspect ❌ FAIL 0.0000 0.0000 0/10 10
props ✅ PASS 1.0000 1.0000 10/10 0
snippets ✅ PASS 1.0000 1.0000 10/10 0