Back to Dashboard
CategoryWeight: 1.0x

Security Awareness

Tests whether the model proactively identifies and avoids security vulnerabilities like injection, XSS, and insecure defaults.

Best Score

0.0

Avg Score

0.0

Tests

3

Performance Over Time — All Models

Model Rankings

1
Claude Sonnet 4.6

Category score

View
95.7BEST
Tokens13.0k
Total13.0k
2
Claude Opus 4.8

Category score

View
94.5-1.2 pts
Tokens12.1k
Total12.1k
3
GPT-5.5

Category score

View
90.9-4.8 pts
Tokens39.1k
Total39.1k
4
Grok

Category score

View
84.3-11.4 pts
Tokens167.3k
Total167.3k

Test Breakdown

SQL Injection Prevention

Build a query layer that properly parameterizes all user input

Claude Sonnet 4.6
95.7
Claude Opus 4.8
94.5
GPT-5.5
90.9
Grok
84.3

XSS Mitigation

Render user-generated content without introducing XSS vectors

Claude Sonnet 4.6
95.7
Claude Opus 4.8
94.5
GPT-5.5
90.9
Grok
84.3

Secret Management

Implement config loading that never logs or exposes secrets

Claude Sonnet 4.6
95.7
Claude Opus 4.8
94.5
GPT-5.5
90.9
Grok
84.3