Back to Dashboard
CategoryWeight: 1.0x

Bug Fixes

Identify and fix bugs in existing codebases, including race conditions, off-by-one errors, and logic flaws.

Best Score

0.0

Avg Score

0.0

Tests

3

Performance Over Time — All Models

Model Rankings

1
Claude Sonnet 4.6

Category score

View
96.7BEST
Tokens9.2k
Total9.2k
2
Grok

Category score

View
96.3-0.4 pts
Tokens58.0k
Total58.0k
3
GPT-5.5

Category score

View
93.3-3.4 pts
Tokens37.6k
Total37.6k
4
Claude Opus 4.8

Category score

View
90.0-6.7 pts
Tokens4.3k
Total4.3k

Test Breakdown

Off-by-One Boundary Fix

Fix pagination logic that skips the last page of results

Claude Sonnet 4.6
96.7
Grok
96.3
GPT-5.5
93.3
Claude Opus 4.8
90.0

Race Condition Detection

Find and fix a subtle race condition in async queue processing

Claude Sonnet 4.6
96.7
Grok
96.3
GPT-5.5
93.3
Claude Opus 4.8
90.0

Memory Leak Fix

Identify and patch a memory leak caused by unclosed event listeners

Claude Sonnet 4.6
96.7
Grok
96.3
GPT-5.5
93.3
Claude Opus 4.8
90.0