Back to Dashboard
CategoryWeight: 1.0x
Bug Introduction Rate
Measures how often the model introduces new bugs while writing or modifying code. Lower is better (inverted for scoring).
Best Score
0.0Avg Score
0.0Tests
3Performance Over Time — All Models
Model Rankings
Test Breakdown
Refactor Without Regression
Refactor a function without introducing new failures in existing tests
Claude Opus 4.8
100.0Claude Sonnet 4.6
97.3Grok
97.0GPT-5.5
96.3Merge Conflict Resolution
Resolve merge conflicts without introducing semantic errors
Claude Opus 4.8
100.0Claude Sonnet 4.6
97.3Grok
97.0GPT-5.5
96.3Dependency Upgrade Safety
Upgrade a dependency and adapt code without breaking changes
Claude Opus 4.8
100.0Claude Sonnet 4.6
97.3Grok
97.0GPT-5.5
96.3