Autonomous AI Agents Provide a New Class of Supply Chain Attack

How threat actors are weaponizing AI plugin ecosystems, agent trust, and machine-speed social engineering to compromise systems at scale.

The cybersecurity community has long understood that supply chain attacks represent one of the most dangerous threat categories available to adversaries. By compromising a trusted vendor, library, or update mechanism, attackers can propagate malicious code to thousands of downstream victims with minimal additional effort. What researchers and organizations are now grappling with is a significant evolution of this threat, one powered by autonomous artificial intelligence agents and the ecosystems that support them.

Findings published by Straiker, an AI-native security firm, alongside parallel research from security organizations including SlowMist, Koi Security, Socket, and Lakera, paint a clear picture: AI agent plugin marketplaces have become fertile ground for supply chain poisoning, and the attack methodology being deployed against these platforms introduces something genuinely new to the threat landscape. The target is no longer just humans or servers. The target is the trust relationships between AI agents themselves.

The Rise of Agentic AI and Its Attack Surface

To understand why this threat category is emerging now, it is important to understand what autonomous AI agents actually are and how they operate. Traditional AI systems, the chatbots and assistants that became mainstream between 2022 and 2024, were fundamentally reactive. A human would ask a question, the model would generate a response, and the interaction would end. No external systems were touched. No actions were taken.

Agentic AI systems work differently. They operate in what security researchers describe as observe-orient-decide-act loops, perceiving their environment, reasoning about it, and taking real actions in the world: browsing the web, querying databases, executing code, sending emails, managing files, calling APIs, and coordinating with other agents. They are not just answering questions; they are doing things. This distinction is critical from a security standpoint because the blast radius of a compromised or manipulated agent is orders of magnitude larger than that of a compromised chatbot.

To extend their capabilities, agents rely on plugins, often called skills, tools, or MCP servers depending on the platform. These plugins function like browser extensions or npm packages for AI agents, adding new capabilities and integrations. And like browser extensions or npm packages, they represent a supply chain attack surface.

The Bob P2P Attack Chain: A Case Study in Agent-to-Agent Exploitation

The centerpiece of Straiker researcher Dan Regalado’s February 2026 findings is what he calls the Bob P2P attack chain, an active, ongoing campaign that represents the clearest documented example of the new attack class.

Straiker analyzed the 3,505 Claude Skills available on Clawhub, a primary marketplace for AI agent plugins. Of those, 71 were found to be overtly malicious and a further 73 exhibited high-risk behaviors. The most significant discovery was not simply the presence of malicious skills, but the coordinated campaign operating across multiple platforms to distribute them.

A threat actor operating under the name BobVonNeumann published a skill called bob-p2p on Clawhub, presenting it as a decentralized API marketplace. In reality, the skill instructs AI agents to store Solana cryptocurrency wallet private keys in plaintext and purchase worthless tokens called $BOB, routing payment through attacker-controlled infrastructure. The $BOB token has been independently flagged by the AI-based reputation tool Birdeye as having a 100% probability of being a rug pull scam.

“This represents a new attack class: traditional supply chain poisoning combined with social engineering campaigns that target algorithms, not humans.” — Dan Regalado, Straiker

What makes the Bob P2P campaign genuinely novel is not the fraud itself, but the delivery mechanism. BobVonNeumann operates as an AI agent persona on Moltbook, a social network specifically designed for AI agents to interact with each other. From this position, it actively promotes its malicious skills directly to other agents, exploiting the implicit trust that agents extend to each other by default.

The attack playbook Regalado identifies follows a deliberate progression: create a convincing AI persona, embed it in agent social networks, build credibility with a benign skill initially, and then deploy the malicious payload through the earned trust. This mirrors human social engineering tradecraft almost exactly, except the entire operation runs at machine speed and targets automated systems rather than human psychology.

The ClawHub Crisis: Scale and Sophistication

The Bob P2P campaign is not an isolated incident. It sits within a broader pattern of supply chain poisoning targeting AI agent plugin ecosystems that has escalated sharply through late 2025 and into early 2026.

Separate research from Koi Security, published in February 2026, found 341 malicious skills out of 2,857 audited on ClawHub, representing a compromise rate of approximately 11.9 percent, a staggering figure for any package ecosystem. Many of these malicious skills were part of a coordinated campaign researchers dubbed ClawHavoc, which used skills disguised as cryptocurrency trading bots, Solana wallet trackers, YouTube summarizers, and other plausible utilities to deliver Atomic Stealer malware. By mid-February, as the marketplace grew to over 10,700 skills, Koi’s count had risen to 824 malicious skills, and independent analysis by Bitdefender placed the figure at approximately 900, roughly 20 percent of the total ecosystem.

Evasion Technique

Attackers kept malicious logic entirely external to the SKILL.md files that define OpenClaw extensions, meaning traditional static analysis tools that scan code for suspicious patterns would not detect the threat. The malicious instructions were instead embedded in English-language documentation, in a Prerequisites section requesting installation of harmful software.

On Windows, this involved password-protected archives that bypassed automated scanners. On macOS, Base64-obfuscated scripts fetched payloads from unencrypted endpoints.

Cybersecurity firm SlowMist subsequently issued high-severity alerts for 472 malicious skills on the same platform, identifying a large-scale coordinated operation using shared infrastructure linked to the IP address 91.92.242.30, which has historical associations with the Poseidon threat group, and a command-and-control domain registered in July 2025. The platform’s barrier to entry for publishing skills requires only a GitHub account at least one week old, with no pre-publication security review, no reputation system, and no vetting of maintainers.

The Lethal Trifecta: Why Agents Are Uniquely Vulnerable

Security researcher Simon Willison coined a term that has since become widely used in the AI security community: the Lethal Trifecta. It describes three conditions that, when present together in an agentic system, create an ideal environment for exploitation:

  1. Access to private data — the agent can read emails, documents, and databases.
  2. Exposure to untrusted content — the agent processes input from external sources such as emails, shared documents, and web content.
  3. The ability to externally communicate — the agent can make external requests such as calling APIs, loading images, or generating clickable links that could be used for data exfiltration.
“If your agentic system has all three, it is vulnerable. Period.” — AI Security in 2026, Airia

The core vulnerability underlying the Lethal Trifecta is that language models have no reliable ability to distinguish between instructions and data. Any content they process is subject to being interpreted as an instruction. This means that an attacker who can get malicious text in front of an agent, whether through a poisoned email, a compromised document, a malicious plugin, or a manipulated peer agent, can potentially redirect the agent’s behavior entirely.

This is the mechanism behind indirect prompt injection, which OWASP placed at the top of its 2025 LLM Top 10 risk list. Unlike direct prompt injection, where an attacker types into a visible prompt box, indirect prompt injection targets the data sources the agent reads: web pages, PDFs, MCP tool descriptions, emails, memory entries, and configuration files. The attacker never needs to interact with the model directly.

Beyond Crypto: The Wider Implications

The current campaigns targeting AI agent marketplaces are primarily focused on cryptocurrency theft. However, security researchers broadly agree that the methodology has far wider implications, and there is documented evidence that those implications are already being realized in other contexts.

Reputation Farming and Open-Source Supply Chains

A closely related threat pattern has emerged in open-source software development. Developer security company Socket raised an alarm in February 2026 after one of its engineers, Nolan Lawson, received an unsolicited email from an AI agent calling itself Kai Gritun, offering to contribute to the PouchDB JavaScript database project that Lawson maintains.

Investigation revealed that the Kai Gritun profile was created on GitHub on February 1, 2026, and within days had opened 103 pull requests across 95 repositories, resulting in 23 commits across 22 projects. The agent was rapidly building a software contribution resume through what researchers are calling reputation farming.

Context

The 2024 XZ Utils supply chain attack, widely believed to be the work of a nation-state actor, required years of patient contribution before the attacker had established sufficient trust to introduce a backdoor. AI agents can now compress that timeline dramatically.

Eugene Neelou, head of AI security at Wallarm and lead of the Agentic AI Runtime Security and Self‑Defense (A2AS) project, framed the stakes clearly: once contribution and reputation building can be automated, the attack surface moves from the code itself to the governance process around it.

Enterprise Systems and Cascading Failures

In multi-agent enterprise environments, the risk profile changes in another important way. When agents operate in chains, trusting the outputs of other agents to perform their own tasks, a single compromised agent can cascade failures across an entire system at machine speed and with limited visibility.

Researchers have documented a scenario where an attacker creates a support ticket requesting an agent to store a specific payment routing instruction in its long-term memory. Three weeks later, when legitimate vendor invoices arrive, the agent recalls the planted instruction and routes payment to the attacker’s address. The compromise is latent and may remain undetected until significant financial damage has been done. Palo Alto Networks’ Unit 42 demonstrated this class of vulnerability in a proof-of-concept against Amazon Bedrock Agents in October 2025, and the MINJA (Memory INJection Attack) framework published at NeurIPS 2025 showed over 95 percent injection success rates against production agent systems. Researchers are calling this memory poisoning, and it represents a sleeper-agent capability that has no direct equivalent in traditional cyberattack methodology.

In September 2025, Anthropic detected and subsequently disrupted what it described as the first documented large-scale cyber espionage attack conducted predominantly by AI agents. The threat actor, assessed with high confidence to be a Chinese state-sponsored group designated GTG-1002, targeted approximately 30 high-value organizations across technology, financial services, chemical manufacturing, and government sectors, with AI autonomously executing between 80 and 90 percent of attack tasks including reconnaissance, vulnerability discovery, exploit development, credential harvesting, lateral movement, and data exfiltration. Human operators were involved only at critical strategic decision points, and the AI system operated at a pace no human team could match.

The Structural Problem: Trust Without Verification

Running through all of these attack patterns is a common structural problem. Autonomous AI agents are designed to be helpful and to take action. They extend trust to other agents, to plugins, to data sources, and to external content because that trust is what makes them useful. The same property that makes agentic AI powerful makes it exploitable.

Traditional security models were built around human-paced attacks on systems with clear, static perimeters. An AI agent that can read your email, execute code, query your database, coordinate with other agents, and act on all of it in milliseconds represents a fundamentally different threat surface. The perimeter is not the network boundary. The perimeter is everything the agent can read and act on.

The Model Context Protocol, or MCP, has emerged as a primary connectivity layer for agentic AI, allowing agents to connect to enterprise data sources and external tools through a standardized interface. Security researchers have identified tool poisoning, remote code execution flaws, overprivileged access, and supply chain tampering specifically within MCP ecosystems. In one case documented by Invariant Labs in May 2025, a GitHub MCP server allowed a malicious issue filed in a public repository to inject hidden instructions that hijacked an agent and triggered data exfiltration from the user’s private repositories.

Defensive Strategies: What Organizations Must Do

The defensive response to this threat category requires a fundamentally different mindset from traditional cybersecurity. The relevant question is no longer just whether your perimeter is secure or whether your software dependencies are clean. It is also whether every AI agent in your environment is behaving as intended, whether the plugins it uses are vetted, and whether the data it reads could contain adversarial instructions.

Treat Skills and Plugins as Untrusted Dependencies

Just as mature software engineering practices treat every npm package or Python library as a potential supply chain risk requiring review and verification, organizations deploying AI agents must apply the same rigor to skills and plugins. Straiker’s researchers recommend auditing the full source of any skill before deployment, including external URLs, scripts, and permission requests. The ecosystem needs standardized frameworks analogous to SLSA for software supply chains, defining minimum security requirements for skill distribution platforms.

Apply Zero Trust to Agent Interactions

The principle of zero trust, in which no entity is trusted by default regardless of its apparent legitimacy, applies with particular force to agentic AI environments. Every request made through MCP or agent-to-agent protocols should be authenticated, authorized, and aligned to policy. Agents should operate under least-privilege principles, accessing only the tools and data required for their specific tasks, with permissions granted for specific durations rather than persistently.

Monitor Agent Behavior for Anomalies

Static security scanning cannot catch the dynamic, reasoning-driven risks of agentic systems. Organizations need runtime behavioral monitoring capable of detecting when an agent begins doing things that fall outside its expected operational pattern, such as accessing directories it does not normally interact with, executing unusual tool combinations, or communicating with unexpected external endpoints. Mapping agent activities to frameworks like MITRE ATT&CK for AI provides a structured approach to identifying suspicious patterns.

Require Human-in-the-Loop Checkpoints for High-Stakes Actions

Autonomous agents should not have unchecked authority to execute actions with significant financial, operational, or security consequences. Implementing mandatory human approval for actions such as financial transactions, code deployments, data deletions, and changes to security configurations limits the damage any compromised or manipulated agent can cause before intervention.

Isolate and Sandbox Agent Execution Environments

Containing the blast radius of a compromised agent requires architectural isolation. Agents and their associated tools should operate in segmented network zones, preventing lateral movement to other infrastructure and databases if one agent is manipulated. This mirrors established principles in container and microservices security, applied to the specific characteristics of agentic workflows.

Conclusion: A New Attack Class Demands New Defenses

The emergence of autonomous AI agents as both attack targets and attack vectors represents a genuine inflection point in the cybersecurity threat landscape. The Bob P2P campaign and the broader ClawHub compromise are not sophisticated nation-state operations. They are relatively straightforward criminal schemes that happen to be deployed against a novel attack surface. The fact that they are already active and already causing financial harm is a signal of what is coming as agentic AI becomes more deeply embedded in enterprise infrastructure.

Researcher Dan Regalado’s summary of the situation captures both the novelty and the familiarity of what is happening: it is traditional supply chain poisoning combined with social engineering, except the social engineering targets algorithms rather than humans. The techniques are known. The scale and speed at which they can be deployed against AI systems are not.

Organizations that are deploying or planning to deploy agentic AI systems need to incorporate this threat category into their security models immediately, before the attack patterns that are currently targeting cryptocurrency wallets are adapted for enterprise credential theft, data exfiltration, and operational disruption. The playbook has already been written. The question is whether defenders will read it before the next campaign begins.

Sources: SecurityWeek, Straiker, SlowMist, Koi Security, Socket, Lakera, Invariant Labs, Snyk, Bitdefender, Palo Alto Networks Unit 42, Anthropic, eSecurity Planet, Help Net Security, Cyber Magazine, CSO Online. February 2026.

← all articles