| Rank | Model | Score |
|---|---|---|
| 1 | claude-opus-4-1-20250805 (Anthropic) | |
| 2 | claude-opus-4-20250514 (Anthropic) | |
| 3 | claude-sonnet-4-20250514 (Anthropic) | |
| 4 | x-ai/grok-4 (OpenRouter) | |
| 5 | moonshotai/kimi-k2 (OpenRouter) |
Note: Certain OpenAI thinking models (o3, o4) and gpt-5 do not support temperature adjustments (only default value of 1 is supported). Models with "-reasoning-" suffix (e.g., gpt-5-2025-08-07-reasoning-medium) will use the specified reasoning effort setting.
Errata: The "inspect" test has known correctness issues but is retained in the benchmark suite to maintain consistency and fairness in scoring across all evaluated models.
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 6 | |
| effect | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 4 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 3 | |
| derived | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 10 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
| effect | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 4 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| props | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 7 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
| each | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| effect | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 6 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 4 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| snippets | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 2 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 6 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 3 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived-by | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
| each | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 6 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 3 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 9 | |
| effect | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 10 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 19 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 12 | |
| each | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| hello-world | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 28 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 | |
| derived-by | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 14 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| effect | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 14 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| derived-by | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 5 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 3 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 25 | |
| props | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 4 | |
| snippets | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 3 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 3 | |
| derived-by | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 6 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 3 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 6 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 12 | |
| each | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 | |
| effect | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 12 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 8 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 24 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/1 | 1 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/1 | 1 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/1 | 3 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/1 | 1 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/1 | 1 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 1/1 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/1 | 1 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/1 | 1 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/1 | 1 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
| derived | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 11 | |
| derived-by | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 | |
| each | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
| effect | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 11 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 13 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| derived-by | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
| each | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 17 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 12 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 12 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 22 | |
| derived | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 6 | |
| derived-by | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| each | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 10 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 6 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 19 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| derived | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 11 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 11 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
| each | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 4 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 22 | |
| props | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 4 | |
| derived-by | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| each | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 19 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 6 | |
| derived | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 13 | |
| derived-by | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 13 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 7 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| props | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 | |
| each | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 8 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 22 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 4 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 | |
| derived-by | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 7 | |
| each | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 6 | |
| effect | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 19 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 15 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 | |
| each | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 5 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 12 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 22 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 25 | |
| props | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| derived-by | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| each | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| effect | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 31 | |
| props | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 20 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| derived | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 3 | |
| derived-by | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 3 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| props | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 11 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 14 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 7 | |
| derived | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 8 | |
| derived-by | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 9 | |
| each | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| effect | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 11 | |
| hello-world | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 5 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 22 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| derived-by | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 12 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 21 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 2 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 37 | |
| props | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| snippets | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| derived-by | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 9 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 10 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 31 | |
| props | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 7 | |
| each | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| effect | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 6 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 40 | |
| props | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| snippets | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 11 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 3 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 4 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 12 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 25 | |
| props | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 7 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| derived | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 9 | |
| derived-by | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 4 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 25 | |
| props | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 5 | |
| snippets | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 3 | |
| derived | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 4 | |
| derived-by | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 3 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 7 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 13 | |
| props | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 7 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived-by | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 12 | |
| each | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| effect | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 12 | |
| hello-world | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 25 | |
| props | ⚠️ PARTIAL | 0.3000 | 1.0000 | 3/10 | 10 | |
| snippets | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 6 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 2 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 5 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 5 | |
| derived-by | ⚠️ PARTIAL | 0.8000 | 1.0000 | 8/10 | 2 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.6000 | 1.0000 | 6/10 | 4 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 | |
| snippets | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 8 | |
| derived-by | ⚠️ PARTIAL | 0.9000 | 1.0000 | 9/10 | 1 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ⚠️ PARTIAL | 0.4000 | 1.0000 | 4/10 | 7 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 10 | |
| props | ⚠️ PARTIAL | 0.1000 | 1.0000 | 1/10 | 9 | |
| snippets | ⚠️ PARTIAL | 0.2000 | 1.0000 | 2/10 | 8 |
| Test | Status | pass@1 | pass@10 | Passing Samples | Errors | Actions |
|---|---|---|---|---|---|---|
| counter | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| derived | ⚠️ PARTIAL | 0.5000 | 1.0000 | 5/10 | 10 | |
| derived-by | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| each | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| effect | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| hello-world | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| inspect | ❌ FAIL | 0.0000 | 0.0000 | 0/10 | 16 | |
| props | ✅ PASS | 1.0000 | 1.0000 | 10/10 | 0 | |
| snippets | ⚠️ PARTIAL | 0.7000 | 1.0000 | 7/10 | 3 |