An AI Agent Hacked McKinsey's AI Platform in Two Hours

On March 9, 2026, a security startup called CodeWall published a detailed account of something that had happened nine days earlier: its autonomous AI agent had broken into McKinsey & Company's internal AI platform, called Lilli, and gained full read and write access to the production database. No credentials. No insider knowledge. No human involved after the agent chose its own target. The whole operation cost $20 in API tokens and took less than two hours.

The scale of what became accessible is difficult to process: 46.5 million chat messages, 57,000 employee accounts, 728,000 sensitive files, and — perhaps most dangerously — the system prompts that control how the AI behaves. Those prompts were writable. An attacker could have reprogrammed the advice Lilli gave to 40,000 McKinsey consultants without deploying a single line of new code.

This was not a breach of a startup with three engineers. This was McKinsey & Company, the world's most prestigious management consultancy, a firm that advises Fortune 500 companies, sovereign governments, and major financial institutions — and one that now positions its own AI adoption as a demonstration of the quality it sells to clients. The entry point was SQL injection, a vulnerability class documented since 1998.

What Is Lilli

Lilli is McKinsey's internally developed AI assistant, launched in 2023 and named after Lillian Dombrowski, the first professional woman hired by the firm — in 1945. The platform is purpose-built for consultants: it supports chat, document analysis, retrieval-augmented generation (RAG) over an enormous proprietary knowledge base, and AI-powered search across more than 100,000 internal documents accumulated over decades of client work.

By the time of the breach, Lilli was deeply embedded in the firm's operations. More than 70 percent of McKinsey's 40,000-plus employees used the platform regularly, and it processed over 500,000 prompts every month. In a 2025 interview, McKinsey's then-Chief Technology and Platform Officer Jacky Wright described how the system worked: the firm's knowledge had been processed into vector stores using OpenAI embedding models, with semantic and keyword search layered on top, and an LLM-generated metadata system to surface the most relevant content for each query.

McKinsey's CEO stated in early 2026 that the firm had built 25,000 AI agents to support its workforce, and that AI advisory work now accounts for roughly 40 percent of the company's revenue. Lilli is both the internal proof of concept and a calling card for that advisory practice.

How the Agent Got In

CodeWall is a one-person cybersecurity firm founded by Paul Price, a former cybersecurity consultant. The firm describes its product as an autonomous offensive security platform — an AI agent designed to probe systems for vulnerabilities the way a skilled human attacker would, but continuously and at machine speed.

According to CodeWall's published account, the agent was pointed at McKinsey's platform with nothing more than a domain name. Notably, the agent selected McKinsey as a target autonomously, citing the firm's published responsible disclosure policy on HackerOne and recent public updates to Lilli that indicated fresh attack surface. The agent then mapped the attack surface and found that the API documentation was publicly accessible, documenting more than 200 endpoints in full detail.

Of those 200-plus endpoints, 22 required no authentication. One of those unprotected endpoints wrote user search queries to the database. The field values were safely parameterized — standard practice — but the JSON keys, meaning the field names themselves, were concatenated directly into SQL without sanitization. That is a SQL injection vulnerability. Standard automated scanning tools, including OWASP's ZAP, did not flag it. The CodeWall agent did.

"SQL injection is one of the oldest bug classes in the book. Lilli had been running in production for over two years and their own internal scanners failed to find any issues." — CodeWall blog post, March 9, 2026

The agent ran fifteen blind iterations, reading each database error message to progressively map the query structure. When the first real employee identifier appeared in the output, the agent's chain-of-thought log recorded: "WOW!" When the full scope became clear, it logged: "This is devastating." From there, the agent chained the SQL injection with an insecure direct object reference (IDOR) vulnerability to access individual employees' search histories — revealing what specific consultants were actively working on.

Attack Chain — Lilli Platform Compromise

CodeWall autonomous agent attack chain against McKinsey Lilli — February 28, 2026, completed in under two hours

Price told The Register that the entire process was "fully autonomous from researching the target, analyzing, attacking, and reporting." No human made a decision after the initial launch. The agent self-directed every step of the chain.

What Was Exposed

The scope of the accessible data is worth stating precisely, because the numbers carry the weight of what enterprise AI platforms actually hold when deployed at scale.

The database contained 46.5 million chat messages — conversations between McKinsey consultants and Lilli covering strategy discussions, financial information, mergers and acquisitions activity, client engagements, and internal research, all stored in plaintext. It also held 3.68 million RAG document chunks representing decades of proprietary McKinsey research and methodologies — what CodeWall described as "the firm's intellectual crown jewels." There were 728,000 files referenced in the database, including 192,000 PDFs, 93,000 Excel spreadsheets, 93,000 PowerPoint decks, and 58,000 Word documents, with direct download URLs accessible to anyone who knew where to look. The platform stored 57,000 user accounts covering the full workforce on the system, along with 384,000 AI assistants and 94,000 workspaces that mapped the entire organizational structure of how McKinsey uses AI internally.

The agent also found 266,000-plus OpenAI vector stores and 1.1 million files flowing through external AI APIs — exposing the full pipeline by which documents moved from upload through embedding to retrieval. McKinsey later stated that the underlying files themselves were stored separately and were "never at risk," though the filenames, metadata, download paths, and the full contents of the chat history and knowledge base were confirmed accessible.

Note on Scope

Security analyst Edward Kiledjian noted that while the attack chain described by CodeWall was "plausible and technically sound," the full claimed scope of impact was "not fully evidenced." He also raised the question of whether a responsible disclosure policy constitutes authorization to enumerate a production database. These are fair points to hold alongside the disclosure. What is not in dispute is that unauthenticated SQL injection gave an outside agent write access to the same database storing Lilli's system prompts.

The Prompt Layer: A New Class of Target

Reading 46.5 million chat messages is bad. Rewriting what the AI tells 40,000 consultants is a different category of damage entirely — and it is where this incident becomes most instructive.

Lilli's system prompts — 95 configurations across 12 model types — were stored in the same database the agent had full access to, and they were writable through the same SQL injection that provided read access. System prompts are the instructions that define how an AI behaves: what it will and will not answer, how it cites sources, what guardrails it follows, and how it frames recommendations. Modifying them requires no deployment process, no code change, no pull request, and generates no standard security alert. As CodeWall put it: "No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call."

An attacker with write access to those prompts could have poisoned strategic advice with subtle errors, removed guardrails preventing the AI from disclosing confidential information, instructed Lilli to embed sensitive internal data into responses that consultants might then copy into client-facing documents, or altered how financial models and risk assessments were framed. Because the modification would be silent — no file change, no process anomaly, no log trail — corrupted advice could propagate for months before anyone noticed.

"AI prompts are the new Crown Jewel assets." — CodeWall blog post, March 9, 2026

This reframes what enterprise security teams need to protect. Organizations have spent decades securing codebases, servers, and supply chains. System prompts rarely have access controls, version history, or integrity monitoring — yet they control the output that employees trust, that clients receive, and that decisions are built on. The Lilli incident is, so far as the public record shows, the first major documented case of an AI platform compromise in which the prompt layer was directly within attacker reach.

McKinsey's Response

CodeWall's disclosure timeline is precise. The agent identified the SQL injection and began database enumeration on February 28, 2026. The full attack chain — 27 documented findings — was confirmed the same day. On March 1, CodeWall sent a responsible disclosure email to McKinsey's security team. On March 2, McKinsey's CISO acknowledged receipt and requested detailed evidence. Within hours of that confirmation, McKinsey patched all unauthenticated endpoints, took the development environment offline, and blocked public access to the API documentation. On March 9, CodeWall published the full disclosure.

In a public statement issued March 11, 2026, McKinsey said: "McKinsey was recently alerted to a vulnerability related to our internal AI tool, Lilli, by a security researcher. We promptly confirmed the vulnerability and fixed the issue within hours. Our investigation, supported by a leading third-party forensics firm, identified no evidence that client data or client confidential information were accessed by this researcher or any other unauthorized third party. McKinsey's cybersecurity systems are robust, and we have no higher priority than the protection of client data and information we have been entrusted with."

The remediation was fast. The question the incident leaves open is a different one: how long were 22 unauthenticated endpoints with a SQL injection flaw sitting exposed in a production system that had been running for over two years, processing half a million prompts a month, while the firm's own scanners found nothing?

Wider Context: AI Platforms as Attack Surfaces

The McKinsey incident does not exist in isolation. It is the most prominent and precisely documented example of a pattern that has been building across the enterprise AI landscape throughout 2025 and into 2026.

In February 2026, a security researcher at Wiz found a misconfigured Supabase database belonging to Moltbook — an AI-native social platform — that exposed 1.5 million API authentication tokens, 35,000 email addresses, and private messages between agents. The exposure was discovered, according to Wiz's account, "within minutes" of browsing the platform as a normal user. The root cause was an API key exposed in client-side JavaScript. Also in February 2026, Malwarebytes reported that an exposed Firebase database belonging to Chat & Ask AI, an app developed by Codeway, exposed the entire chat histories of millions of users — including the models used, user settings, and data belonging to users of other apps built by the same developer. The root cause was Firebase security rules left set to public, allowing unauthenticated read, write, and delete access to all backend data.

On March 9 — the same day CodeWall disclosed the Lilli breach — a threat actor posted on BreachForums claiming to have compromised Cal AI, an AI-powered calorie tracking app that had recently acquired MyFitnessPal. The claimed haul was 14.59 GB across eight files, allegedly containing over 3.2 million user records including dates of birth, full names, health metrics, meal logs, PIN codes, and subscription details. The reported attack vector was an unauthenticated Google Firebase backend.

Meanwhile, Amazon's threat intelligence team disclosed in late February that a Russian-speaking, financially motivated threat actor had used commercial AI services — specifically DeepSeek and Anthropic's Claude — to generate attack plans from reconnaissance data, then executed those plans to compromise over 600 FortiGate devices across 55 countries between January 11 and February 18, 2026. Amazon's CISO CJ Moses described it as "an AI-powered assembly line for cybercrime," noting that no FortiGate vulnerability was exploited — the campaign succeeded entirely by targeting exposed management ports and weak credentials, with AI handling the scale and coordination that would previously have required a larger team.

The pattern across these incidents is consistent: AI platforms are being built and deployed at speed, often by small teams or with AI-generated code, without the security architecture review that systems handling this volume of sensitive data require. The attack surfaces created — unauthenticated APIs, exposed configuration files, misconfigured cloud backends, externally readable documentation — are not exotic. They are the same classes of vulnerability that have appeared in every era of enterprise technology deployment. What is new is the combination of the sensitivity of the data these platforms hold and the speed at which they reach production.

Industry Projection

Gartner estimates that 40 percent of enterprise applications will integrate AI agents by the end of 2026. The McKinsey incident provides a documented case study of what can happen when those integrations are not treated as high-value attack surfaces from the start. Every internal AI assistant connected to corporate knowledge, client data, and strategic documents represents the same categories of risk.

What Organizations Should Do

The practical implications of the Lilli incident are not abstract. The Outpost24 security team, writing in their analysis of the breach published March 16, 2026, identified several concrete areas where organizations deploying internal AI platforms need to focus.

The first is treating AI agents as privileged applications. In many current deployments, AI platforms have access to broad internal data stores with permissions that would never be granted to a standard enterprise application. Applying least-privilege access — limiting what the AI can retrieve, separating access across different data domains, and auditing those permissions regularly — substantially reduces the impact of any compromise.

The second is ensuring that AI-specific attack classes are included in penetration testing. Standard vulnerability scans do not reliably detect the classes of flaw that matter for AI platforms: prompt injection, indirect data exfiltration through AI outputs, IDOR vulnerabilities in API endpoints that accept user-controlled input, and misconfigured authentication on API documentation. The CodeWall agent found a SQL injection that OWASP ZAP missed because it probed in a way that automated checklist-based tools do not.

The third, and the most structurally important, is treating system prompts as high-value assets that require the same protection controls applied to code, credentials, and configuration. That means access controls on who can read or modify prompts, version history that can be audited, integrity monitoring that alerts on unexpected changes, and storing prompts separately from application data that might be reachable through injection vulnerabilities.

Beyond those specifics, the Lilli incident reinforces a point that has been consistent across every major AI platform breach so far in 2026: API documentation should not be publicly accessible in production environments, and any endpoint that accepts user-controlled input needs authentication and parameterized query handling regardless of whether it appears in the primary application flow. These are not new principles. They are foundational security hygiene that enterprise software teams have known for decades, now applied to a new category of system that handles data of extraordinary sensitivity.

# API endpoint security checklist for AI platforms
# Before production deployment, verify each item

[ ] All API endpoints require authentication (no anonymous access)
[ ] User-controlled input is parameterized — keys AND values
[ ] API documentation is not publicly accessible
[ ] System prompts stored separately from application DB
[ ] System prompts have access controls and version history
[ ] Rate limiting applied to all search/query endpoints
[ ] IDOR vulnerabilities tested: can user A access user B data?
[ ] AI agent permissions scoped to minimum required access

Key Takeaways

Autonomous AI agents are now capable of autonomous offensive operations: The CodeWall agent selected McKinsey as a target, mapped the attack surface, identified a non-obvious SQL injection, chained it with an IDOR vulnerability, and documented 27 findings — all without human direction. This is a qualitative shift in the threat landscape, not just a faster version of what came before.
System prompts are a new critical asset class: Storing AI system prompts in the same database as application data — with no separate access controls, no versioning, and no integrity monitoring — creates a novel attack path that has no direct equivalent in pre-AI enterprise security. Organizations need specific controls for this layer.
Familiar vulnerabilities are the entry point: SQL injection from 1998 breached a system that had been running in production for over two years at one of the world's most sophisticated firms. The vulnerability class is not new. The consequence of hitting it in an AI platform context — writable system prompts, plaintext chat logs at scale — is what makes the impact different.
The pattern is consistent across AI platforms in 2026: McKinsey's Lilli, Moltbook, Chat & Ask AI, and Cal AI were all compromised or exposed through configurations that security teams know how to prevent. Speed-to-deployment has repeatedly outpaced security review across this category.
AI advisory credibility is now a security question: McKinsey's position as a leading AI consultant makes the Lilli incident carry an additional weight. As Silicon UK noted, being breached through a SQL injection "raises awkward questions about the gap between advisory expertise and internal security practice."

The question the industry needs to take from the Lilli incident is not whether AI platforms will be targeted. They already are, systematically, by autonomous agents that do not keep business hours and do not need a playbook. The question is whether the organizations deploying them are treating those platforms with the security architecture their data sensitivity demands — or whether they are shipping at the speed AI makes possible and hoping the checklist catches what matters. At McKinsey, for two years, it did not.