Note: OpenAI thinking models (o3, o4) do not support temperature adjustments.
← Back to All ResultsTest | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
---|---|---|---|---|---|---|
counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
props | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
---|---|---|---|---|---|---|
counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
props | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
---|---|---|---|---|---|---|
counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived-by | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
each | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
effect | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 6 | |
hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
snippets | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 |
Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
---|---|---|---|---|---|---|
counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
snippets | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 |
Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
---|---|---|---|---|---|---|
counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 | |
derived-by | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 14 | |
each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
effect | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 14 | |
hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 |
Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
---|---|---|---|---|---|---|
counter | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 13 | |
derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
derived-by | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
each | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 17 | |
hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
---|---|---|---|---|---|---|
counter | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 12 | |
derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 12 | |
derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
each | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 | |
effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |