Top Performing Model

Based on composite benchmark scores

anthropic

Claude Sonnet 4.6

Leading today's benchmarks

0.0/100
+6.7%vs prev day

Performance Timeline

Latest Benchmark Run

Jun 6, 8:00 AMdaily
Claude Opus 4.8

Composite benchmark summary

#2

Composite

89.8

Token Benchmark

68.3

Total tokens

134.7k

~7.1k/test

Best category100.0 Instruction Following
Worst category68.3 Token Efficiency
View details
Claude Sonnet 4.6

Composite benchmark summary

#1

Composite

93.4

Token Benchmark

100.0

Total tokens

121.0k

~4.8k/test

Best category100.0 Token Efficiency
Worst category68.9 Long Reasoning
View details
GPT-5.5

Composite benchmark summary

#3

Composite

86.2

Token Benchmark

36.8

Total tokens

368.2k

~13.2k/test

Best category100.0 Instruction Following
Worst category36.8 Token Efficiency
View details
Grok

Composite benchmark summary

#4

Composite

85.5

Token Benchmark

14.1

Total tokens

991.9k

~34.2k/test

Best category100.0 Instruction Following
Worst category14.1 Token Efficiency
View details