Back to Dashboard
CategoryWeight: 1.0x
Instruction Following
Measures how closely the model adheres to explicit constraints in the prompt, including formatting, language, and output structure.
Best Score
0.0Avg Score
0.0Tests
3Performance Over Time — All Models
Model Rankings
Test Breakdown
Structured Output Compliance
Produce JSON matching an exact schema with no extra fields
Claude Opus 4.8
100.0GPT-5.5
100.0Grok
100.0Claude Sonnet 4.6
96.7Constraint Adherence
Follow explicit constraints like max line length and naming conventions
Claude Opus 4.8
100.0GPT-5.5
100.0Grok
100.0Claude Sonnet 4.6
96.7Multi-step Instruction Chain
Execute a 6-step instruction sequence without skipping or reordering
Claude Opus 4.8
100.0GPT-5.5
100.0Grok
100.0Claude Sonnet 4.6
96.7