SvelteBench Visualization

Note: OpenAI thinking models (o3, o4) do not support temperature adjustments. o1-pro models use "medium" reasoning effort setting.

← Back to All Results

OpenRouter

meta-llama/llama-4-scout

Test Status pass@1 pass@10 Passing Samples Errors Actions
counter ⚠️ PARTIAL 0.9000 1.0000 9/10 1
derived ⚠️ PARTIAL 0.2000 1.0000 2/10 8
derived-by ❌ FAIL 0.0000 0.0000 0/10 11
each ❌ FAIL 0.0000 0.0000 0/10 11
effect ❌ FAIL 0.0000 0.0000 0/10 10
hello-world ⚠️ PARTIAL 0.3000 1.0000 3/10 11
inspect ❌ FAIL 0.0000 0.0000 0/10 10
props ❌ FAIL 0.0000 0.0000 0/10 10
snippets ❌ FAIL 0.0000 0.0000 0/10 10