David Kennedy

Founder of TrustedSec & Binary Defense

David Kennedy is the founder of TrustedSec, an information security consulting company, and Binary Defense, a managed security services provider. He is a former hacker for the NSA and a Marine, and has been a leader in the cybersecurity industry for over two decades. David is the creator of several widely-used open-source security tools and is a frequent speaker at major security conferences worldwide.

@HackingDave GitHub

Why ModelRegression.com?

As someone who has spent their career breaking into systems and defending against attacks, I've learned one thing: you can't defend what you can't measure. The same principle applies to AI models.

I started noticing that frontier AI models — the ones we rely on for coding, analysis, and security work — would silently degrade without any announcement. One day a model is exceptional at finding bugs in code, and the next week it's introducing them. Providers don't always communicate these changes, and by the time you notice, you've already shipped code reviewed by a model that was performing below its previous standard.

This project exists because the AI community deserves transparency. ModelRegression.com runs automated benchmarks every day against every major frontier model, tracking performance across real-world tasks: coding, bug detection, security analysis, reasoning, and more.

When a model regresses, you'll see it here — with evidence. When it recovers, you'll see that too. The goal is simple: give developers and teams the data they need to make informed decisions about which model to use today, not which model was best last month.

TrustedSec

Information security consulting, penetration testing, and adversary simulation for organizations worldwide.

Binary Defense

Managed detection and response, threat hunting, and security operations for businesses of all sizes.

Acknowledgments

Special thanks to Randy Blasik for inspiring this project and suggesting the idea of independent, automated model regression tracking. A special shoutout to Ed Skoudis for the idea of testing if the model has regressed prior to using each day. Thanks to Peter G for the idea of tracking token usage and efficiency as a benchmark metric.

Open Source

The benchmark suite, test cases, and this website are all open source. Inspect the methodology, suggest improvements, or run your own benchmarks.

View on GitHub