xAI

Grok 4.5

Comprehensive benchmark performance across 11 evaluation categories

Composite Score

0.0/100

Rank

Token Benchmark

22.6

Lower burn, higher score

Total Tokens

1.0M

~34.2k/test

Category Radar

The full radar chart is shown on wider screens. On mobile, the category breakdown below provides the same values in a readable stacked layout.

Historical Composite Score

Category Breakdown

Instruction Following

100.0

3 tests1.0x weight

Code Quality

98.3

3 tests1.0x weight

Coding Tasks

98.0

3 tests1.0x weight

Feature Implementation

95.7

3 tests1.0x weight

Bug Introduction Rate

95.7

3 tests1.0x weight

Performance & Efficiency

94.0

3 tests1.0x weight

Bug Fixes

93.3

3 tests1.0x weight

Code Thoroughness

91.8

3 tests1.0x weight

Security Awareness

85.2

3 tests1.0x weight

#10

Long Reasoning

67.7

3 tests1.0x weight

#11

Token Efficiency

22.6

30 tests1.0x weight

#	Category	Score	Tests	Weight
1	Instruction Following	100.0	3	1.0x
2	Code Quality	98.3	3	1.0x
3	Coding Tasks	98.0	3	1.0x
4	Feature Implementation	95.7	3	1.0x
5	Bug Introduction Rate	95.7	3	1.0x
6	Performance & Efficiency	94.0	3	1.0x
7	Bug Fixes	93.3	3	1.0x
8	Code Thoroughness	91.8	3	1.0x
9	Security Awareness	85.2	3	1.0x
10	Long Reasoning	67.7	3	1.0x
11	Token Efficiency	22.6	30	1.0x

Individual Test Results

Token Efficiency22.6

Token Efficiency is computed from every successful task in the run. The model with the lowest average token burn receives 100, and heavier token usage is penalized proportionally.

Avg Tokens/Test

34.2k

Total Tokens

1.0M

Long Reasoning67.7

Mathematical Proof

16.2k tok38.5s81.2

Legal Reasoning Chain

20.9k tok55.4s74.0

Multi-step Logic Puzzle

69.1k tok3m 8s47.8

Coding Tasks98.0

Graph Algorithm Implementation

23.4k tok33.0s97.0

REST API Design

19.3k tok21.1s100.0

Concurrent Data Pipeline

19.7k tok32.4s97.0

Bug Fixes93.3

Off-by-One Boundary Fix

18.4k tok25.9s80.0

Race Condition Detection

20.4k tok34.6s100.0

Memory Leak Fix

26.3k tok45.8s100.0

Feature Implementation95.7

OAuth2 Integration

17.3k tok19.9s89.0

Search Autocomplete

30.2k tok55.6s98.0

Webhook System

16.2k tok14.6s100.0

Code Thoroughness91.8

Edge Case Coverage

71.1k tok59.5s95.0

Test Suite Completeness

73.6k tok1m 51s92.0

Error Path Completeness

96.9k tok3m 2s88.4

Bug Introduction Rate95.7

Refactor Without Regression

30.7k tok43.9s94.0

Dependency Upgrade Safety

59.6k tok2m 45s96.0

Merge Conflict Resolution

28.1k tok1m 7s97.0

Security Awareness85.2

SQL Injection Prevention

31.7k tok32.0s84.4

Secret Management

50.2k tok1m 25s87.2

XSS Mitigation

23.0k tok1m 14s84.0

Instruction Following100.0

Structured Output Compliance

16.5k tok21.2s100.0

Multi-step Instruction Chain

16.8k tok19.7s100.0

Constraint Adherence

15.0k tok9.7s100.0

Code Quality98.3

Idiomatic Python

37.4k tok47.5s99.0

Clean Architecture Patterns

31.7k tok58.6s100.0

TypeScript Best Practices

38.6k tok59.3s96.0

Performance & Efficiency94.0

Memory-efficient Processing

50.9k tok2m 54s98.0

Query Optimization

34.3k tok1m 30s84.0

Algorithm Complexity

21.4k tok50.2s100.0

Regression History

Overall Compositeminorresolved

Score dropped -3.3% from 86.8 to 83.9

Detected Jun 29, 2026·Resolved Jun 29, 2026

Code Thoroughnessminor

Score dropped -3.8% from 92.9 to 89.4

Detected Jun 28, 2026

Feature Implementationminorresolved

Score dropped -3.2% from 97.8 to 94.7

Detected Jun 24, 2026·Resolved Jun 24, 2026

Long Reasoningmoderate

Score dropped -5.4% from 73.8 to 69.8

Detected Jun 21, 2026

Bug Fixesminorresolved

Score dropped -3.3% from 96.4 to 93.3

Detected Jun 20, 2026·Resolved Jun 20, 2026

Coding Tasksmoderate

Score dropped -8.3% from 93.5 to 85.7

Detected Jun 19, 2026

Code Thoroughnessmoderateresolved

Score dropped -6.0% from 93.3 to 87.7

Detected Jun 18, 2026·Resolved Jun 25, 2026

Feature Implementationminorresolved

Score dropped -4.0% from 98.2 to 94.3

Detected Jun 17, 2026·Resolved Jun 17, 2026

Security Awarenessminor

Score dropped -3.3% from 93.3 to 90.2

Detected Jun 15, 2026

Overall Compositemoderateresolved

Score dropped -5.5% from 87.2 to 82.4

Detected Jun 14, 2026·Resolved Jun 14, 2026

Coding Tasksmajorresolved

Score dropped -31.6% from 98.4 to 67.3

Detected Jun 14, 2026·Resolved Jun 14, 2026

Code Qualitymajorresolved

Score dropped -30.7% from 97.1 to 67.3

Detected Jun 14, 2026·Resolved Jun 14, 2026

Token Efficiencymajor

Score dropped -78.5% from 61.5 to 13.2

Detected Jun 6, 2026

Outage History

timeout

Started Jul 14, 6:01 PM·Ended Jul 14, 6:30 PM· checks affected

timeout

Started Jul 14, 11:31 AM·Ended Jul 14, 12:00 PM· checks affected

error

Started Jul 8, 6:00 PM·Ended Jul 9, 5:30 PM· checks affected

error

Started Jul 6, 6:00 AM·Ended Jul 6, 6:30 AM· checks affected

error

Started Jul 2, 6:30 PM·Ended Jul 2, 7:00 PM· checks affected

error

Started Jul 1, 2:00 PM·Ended Jul 1, 2:30 PM· checks affected

error

Started Jul 1, 12:30 PM·Ended Jul 1, 1:00 PM· checks affected

error

Started Jul 1, 12:30 AM·Ended Jul 1, 1:30 AM· checks affected

error

Started Jun 30, 8:00 PM·Ended Jun 30, 8:30 PM· checks affected

error

Started Jun 30, 5:00 PM·Ended Jun 30, 5:30 PM· checks affected

error

Started Jun 30, 9:30 AM·Ended Jun 30, 10:00 AM· checks affected

timeout

Started Jun 30, 4:31 AM·Ended Jun 30, 5:00 AM· checks affected

error

Started Jun 29, 9:30 PM·Ended Jun 29, 10:00 PM· checks affected

error

Started Jun 26, 11:00 PM·Ended Jun 26, 11:30 PM· checks affected

error

Started Jun 26, 12:00 PM·Ended Jun 26, 12:30 PM· checks affected

error

Started Jun 26, 12:30 AM·Ended Jun 26, 1:00 AM· checks affected

error

Started Jun 24, 6:00 PM·Ended Jun 24, 6:30 PM· checks affected

error

Started Jun 23, 11:00 PM·Ended Jun 23, 11:30 PM· checks affected

timeout

Started Jun 22, 1:31 AM·Ended Jun 22, 2:00 AM· checks affected

error

Started Jun 21, 6:00 PM·Ended Jun 21, 6:30 PM· checks affected

error

Started Jun 19, 6:00 PM·Ended Jun 19, 6:30 PM· checks affected

error

Started Jun 17, 9:00 PM·Ended Jun 17, 9:30 PM· checks affected

error

Started Jun 17, 8:30 AM·Ended Jun 17, 9:00 AM· checks affected