Skip to main content
The Quantum Dispatch
Back to Home
Cover illustration for OpenAI's Codex Security Scanned 1.2 Million Commits and Found 10,561 High-Severity Vulnerabilities in Open-Source Projects

OpenAI's Codex Security Scanned 1.2 Million Commits and Found 10,561 High-Severity Vulnerabilities in Open-Source Projects

The AI-powered security agent discovers critical bugs in OpenSSH, Chromium, PHP, and GnuTLS during its research preview, with false positive rates dropping 50% over 30 days.

Kai Aegis
Kai AegisMar 16, 20265 min read

An AI Agent That Hunts Bugs at Scale

OpenAI launched Codex Security in early March, an AI-powered security agent designed to find, validate, and propose fixes for vulnerabilities in codebases. Over its first 30 days of operation, the system scanned more than 1.2 million commits across external open-source repositories and the results are striking: 792 critical findings and 10,561 high-severity vulnerabilities identified across projects that millions of developers and organizations depend on daily.

The affected projects include some of the most foundational software in the open-source ecosystem — OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium among them. These are not obscure libraries; they are infrastructure that powers servers, browsers, encryption, and authentication across the internet. Finding high-severity vulnerabilities in these projects at this scale demonstrates both the breadth of undiscovered security issues in mature codebases and the potential for AI-assisted security scanning to catch what human reviewers miss.

From Aardvark to Production

Codex Security evolved from Aardvark, which OpenAI first introduced in private beta in October 2025. The production version available today goes beyond simple pattern matching. It validates findings by reasoning about code context, data flow, and potential exploit paths — then proposes specific fixes that maintainers can review and merge. This validation step is crucial because security scanners that produce mountains of false positives quickly become noise that developers ignore.

On that front, the numbers are encouraging. OpenAI reports that false positive rates declined by more than 50% across all scanned repositories during the 30-day evaluation period, as the system learned from feedback and refined its analysis patterns. For security teams accustomed to triaging endless false alarms from traditional static analysis tools, a scanner that gets meaningfully more accurate over time represents a genuine workflow improvement.

Access and the Bigger Picture

Codex Security is available as a research preview to ChatGPT Pro, Enterprise, Business, and Edu customers through the Codex web interface, with free usage for the first month. The timing aligns with growing industry recognition that AI-generated code introduces new security risks even as AI tools become essential for development productivity.

Interestingly, research published the same week found that AI coding agents introduced vulnerabilities in 87% of pull requests across multiple platforms — highlighting the paradox that AI can simultaneously be the source of security problems and a powerful tool for catching them. Codex Security positions OpenAI on the defensive side of that equation, offering organizations a way to scan both human-written and AI-generated code at a scale that manual review simply cannot match.

Sources: OpenAI Blog (March 2026), The Hacker News (March 2026), SecurityWeek (March 2026), Axios (March 6, 2026)