Back to Dashboard
OpenAI

GPT-5.5

Comprehensive benchmark performance across 11 evaluation categories

Composite Score

0.0/100

Rank

#3

Token Benchmark

36.8

Lower burn, higher score

Total Tokens

368.2k

~13.2k/test

Category Radar

The full radar chart is shown on wider screens. On mobile, the category breakdown below provides the same values in a readable stacked layout.

Historical Composite Score

Category Breakdown

3 tests0.8x weight
3 tests1.0x weight
3 tests1.2x weight
3 tests1.1x weight
93.3
3 tests1.2x weight
3 tests0.9x weight
3 tests1.0x weight
3 tests1.1x weight
1 tests0.9x weight
3 tests1.0x weight
28 tests1.0x weight

Individual Test Results

Token Efficiency is computed from every successful task in the run. The model with the lowest average token burn receives 100, and heavier token usage is penalized proportionally.

Avg Tokens/Test

13.2k

Total Tokens

368.2k

Mathematical Proof
12.7k tok33.6s79.9
Legal Reasoning Chain
12.6k tok20.1s67.0
Multi-step Logic Puzzle
12.8k tok27.4s53.0
Graph Algorithm Implementation
12.7k tok28.2s93.0
REST API Design
12.2k tok19.0s100.0
Concurrent Data Pipeline
12.5k tok23.5s97.0
Race Condition Detection
12.3k tok20.9s100.0
Off-by-One Boundary Fix
12.7k tok25.7s80.0
Memory Leak Fix
12.6k tok22.9s100.0
OAuth2 Integration
12.9k tok26.4s97.0
Webhook System
12.7k tok25.9s98.0
Search Autocomplete
12.8k tok28.5s98.0
Test Suite Completeness
13.8k tok43.9s85.0
Refactor Without Regression
15.1k tok1m 5s94.0
Merge Conflict Resolution
13.3k tok27.1s98.0
Dependency Upgrade Safety
14.5k tok52.9s97.0
SQL Injection Prevention
12.6k tok27.0s86.8
XSS Mitigation
13.6k tok39.1s100.0
Secret Management
12.9k tok29.5s86.0
Structured Output Compliance
12.0k tok18.3s100.0
Constraint Adherence
11.8k tok14.0s100.0
Multi-step Instruction Chain
11.8k tok15.3s100.0
Idiomatic Python
13.4k tok41.2s96.0
Clean Architecture Patterns
13.7k tok44.4s95.0
TypeScript Best Practices
14.4k tok54.9s88.0
Algorithm Complexity
12.8k tok26.1s98.0
Memory-efficient Processing
14.2k tok51.5s94.0
Query Optimization
16.9k tok1m 41s84.0

Outage History

1
errorongoing

Started Jun 5, 10:00 PM· checks affected