Note: OpenAI thinking models (o3, o4) do not support temperature adjustments. o1-pro models use "medium" reasoning effort setting.
← Back to All ResultsTest | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
---|---|---|---|---|---|---|
counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
snippets | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 |