Back to Dashboard
CategoryWeight: 1.0x
Long Reasoning
Multi-step logic puzzles, extended chain-of-thought, and complex analytical reasoning tasks requiring sustained coherence over many steps.
Best Score
0.0Avg Score
0.0Tests
3Performance Over Time — All Models
Model Rankings
Test Breakdown
Multi-step Logic Puzzle
Complex optimization with 8+ constraints across multiple variables
Claude Opus 4.8
77.8Grok
70.1Claude Sonnet 4.6
68.9GPT-5.5
66.6Legal Reasoning Chain
Contract dispute analysis requiring multi-party obligation tracking
Claude Opus 4.8
77.8Grok
70.1Claude Sonnet 4.6
68.9GPT-5.5
66.6Mathematical Proof
Prove divisibility properties using induction and modular arithmetic
Claude Opus 4.8
77.8Grok
70.1Claude Sonnet 4.6
68.9GPT-5.5
66.6