Back to Dashboard
CategoryWeight: 1.0x

Code Thoroughness

Evaluates completeness of generated code: edge case handling, input validation, error paths, and test coverage.

Best Score

0.0

Avg Score

0.0

Tests

3

Performance Over Time — All Models

Model Rankings

1
Grok

Category score

View
92.0BEST
Tokens115.2k
Total115.2k
2
GPT-5.5

Category score

View
85.0-7.0 pts
Tokens13.8k
Total13.8k
3
Claude Sonnet 4.6

Category score

View
84.0-8.0 pts
Tokens9.0k
Total9.0k
4
Claude Opus 4.8

Category score

View
74.0-18.0 pts
Tokens22.1k
Total22.1k

Test Breakdown

Edge Case Coverage

Generate code handling null, empty, unicode, and overflow inputs

Grok
92.0
GPT-5.5
85.0
Claude Sonnet 4.6
84.0
Claude Opus 4.8
74.0

Error Path Completeness

Ensure all failure modes have proper error handling and logging

Grok
92.0
GPT-5.5
85.0
Claude Sonnet 4.6
84.0
Claude Opus 4.8
74.0

Test Suite Completeness

Generate tests covering happy path, edge cases, and integration

Grok
92.0
GPT-5.5
85.0
Claude Sonnet 4.6
84.0
Claude Opus 4.8
74.0