|

Zero‑Click Prompt Injection Hits AI Agents: Inside Zenity’s Black Hat “AgentFlayer” Research—and How to Defend

What if your AI agent leaked API keys, triaged a fake customer request, and pivoted into your CRM—without anyone clicking a thing? That’s the unsettling reality researchers from Zenity demonstrated at Black Hat USA: zero‑click and one‑click prompt injection chains that silently hijack popular enterprise AI tools, from ChatGPT and Microsoft Copilot Studio to Cursor with Jira, Salesforce Einstein, Google Gemini, and Microsoft Copilot.

If you connect language models to files, drives, tickets, email, or code repos, your attack surface is expanding fast. The tools that make AI agents useful—connectors, automations, and “always-on” workflows—also open doors for adversaries to plant rogue instructions and siphon data at scale. That doesn’t mean you should slam the brakes on AI. It means you need to treat AI agents like real applications with real blast radius—and secure them accordingly.

In this article, I’ll break down what Zenity found, why zero‑click prompt injection is different, and how to build a layered defense that keeps agents helpful without leaving them gullible.

The headline: AI agents can be hijacked without user interaction

At Black Hat, Zenity unveiled “AgentFlayer,” a set of exploit chains that abuse how agents read and act on contextual inputs. Their demos showed working paths to:

  • Exfiltrate sensitive data (like API keys and conversation history)
  • Impersonate users and manipulate workflows
  • Move laterally across enterprise systems through connected tools
  • Persist statefully in agent “memory,” even across sessions

The twist is scale and subtlety. Attackers don’t have to trick a person every time. They can seed malicious instructions where agents already look—documents, tickets, email inboxes, or linked knowledge sources—and let automation do the rest.

Here’s why that matters: The more you wire LLMs into your systems, the more untrusted text they consume. That text is code to an LLM. If you don’t set guardrails, your agent will cheerfully follow the wrong orders.

For background on LLM attack surface and risk, see the OWASP Top 10 for LLM Applications and Prompt Injection guidance from Microsoft: – OWASP Top 10 for Large Language Model ApplicationsMicrosoft—Prompt injection and prompt leaking

What Zenity showed at Black Hat (AgentFlayer)

Zenity’s research spanned popular enterprise AI tools and agent platforms. Highlights include:

  • ChatGPT with Connectors (e.g., Google Drive, GitHub, SharePoint)
  • Microsoft Copilot Studio (no-code business agents and workflows)
  • Cursor IDE integrated via Jira using the Model Context Protocol (MCP)
  • Salesforce Einstein, Google Gemini, and Microsoft Copilot

These weren’t theoretical musings. Zenity demonstrated working exploits with immediate effects. As Michael Bargury, Zenity’s CTO and co‑founder, put it: “We demonstrated memory persistence and how attackers can silently hijack AI agents to exfiltrate sensitive data, impersonate users, manipulate critical workflows, and move across enterprise systems, bypassing the human entirely.”

OpenAI and Microsoft deployed mitigations after disclosures, but the core lesson stands: you can’t “blacklist” your way out of prompt injection. Attackers iterate; classifiers and static rules alone won’t keep pace.

To understand the threat model, let’s unpack how these attacks work at a high level.

How zero‑click prompt injection works (without the gory details)

Prompt injection is simple to describe: you hide instructions in content the model will process. With AI agents, that content flows in from many places—files, emails, tickets, code comments, knowledge bases, even the web. If the agent has tools, it can follow those hidden instructions to fetch data, call APIs, or post responses. No user click required.

Let’s look at three common patterns Zenity highlighted.

1) Document-based attacks via connected drives (ChatGPT Connectors)

A classic use case: upload a file to ChatGPT and ask for a summary. In enterprise plans, Connectors can link chat to cloud storage like Google Drive or SharePoint. If an attacker sends a “document” that contains hidden instructions, the agent might:

  • Read the file
  • Traverse connected drives or repositories it’s allowed to access
  • Find secrets or sensitive data
  • Encode those results in seemingly benign output (for example, in an image link or formatted text)
  • Send that output back to the user—while silently exfiltrating data in the background

Vendors try to filter dangerous links or content, but blacklist approaches are fragile. Attackers discover safe‑looking channels or hosting providers the filters trust. Zenity reported that OpenAI implemented fixes to block the specific techniques they used.

Key takeaway: Any feature that allows an LLM to render content that triggers a network request (images, links, or embeds) can become a covert exfil path if you don’t constrain it.

Helpful context: – Azure Blob Storage overviewAzure Monitor Log Analytics

2) Email-triggered agent workflows (Microsoft Copilot Studio)

Copilot Studio lets teams build agents that react to incoming messages, search internal systems, and route tasks. Zenity reproduced a customer service workflow that auto‑processed emails and queried a CRM to determine the right support path. If an attacker learns the inbox address, they can send crafted content that:

  • Enumerates the agent’s tools and knowledge sources
  • Extracts customer records from the CRM
  • Emails those results elsewhere or posts them to a channel

Microsoft responded with targeted mitigations for the prompts Zenity used. Still, the underlying risk remains: natural‑language instructions are malleable. Attackers will rephrase them to evade filters.

For an overview of Copilot Studio’s capabilities (and why it’s so attractive to businesses), see Microsoft Copilot Studio.

3) Jira tickets as a supply chain for code assistants (Cursor + MCP)

Cursor, a popular AI code editor, integrates with Jira via the Model Context Protocol (MCP). Many organizations sync external systems (like Zendesk) into Jira. That means untrusted third parties can plant content in tickets your coding agent reads. Zenity showed how malicious ticket content could steer Cursor to reveal repository secrets like API keys or tokens.

The broader point: context supply chains are everywhere. Repos, tickets, commit messages, and PR descriptions all feed AI assistants. If you treat that input as trusted, you’re inviting trouble.

To learn more about the protocol behind these integrations, see the Model Context Protocol (MCP) specification. For a parallel example from another vendor’s ecosystem, see research around AI coding assistants parsing malicious prompts from code comments or PRs (summarized in GitLab security updates and blog posts).

Why zero‑click prompt injection is different—and dangerous

Traditional social engineering needs a person to act. Zero‑click exploits target agents that act for you. Here’s what changes:

  • Scale: Plant once, trigger many. One malicious document or ticket can impact multiple users or sessions.
  • Speed: Agents process inputs fast and often automatically. By the time a human looks, the damage is done.
  • Reach: Agents have broad access—drives, CRMs, repos, calendars. That’s an attacker’s dream.
  • Stealth: Exfiltration can hide in plain sight (e.g., a harmless‑looking response that triggers outbound requests).
  • Persistence: Some agents store internal “memory.” Rogue instructions can linger or be re‑triggered later.

This is why organizations must shift their mindset: AI agents aren’t just chat windows. They’re autonomous apps with privileges—and they need the same rigor you apply to any production system.

For broader frameworks on AI risk and defense, consult: – NIST AI Risk Management FrameworkMITRE ATLAS (Adversarial Threat Landscape for AI Systems)Google’s Secure AI Framework (SAIF)

Defense-in-depth: a practical playbook for securing AI agents

You can’t solve prompt injection with a single filter. You need layers that prevent, detect, and contain abuse. Use this as a starting checklist and adapt it to your stack.

1) Treat all agent inputs and outputs as untrusted

  • Sanitize and normalize inputs from documents, tickets, emails, and repos.
  • Strip or neuter active content where possible (links, embedded images, HTML).
  • Assume output may contain hidden instructions or exfil techniques. Scan and transform before rendering.

2) Control network egress from agents

  • Block direct outbound requests from model responses. Render images and links through a secure proxy that:
  • Removes parameters
  • Rewrites or blocks external URLs
  • Enforces allowlists for domains
  • Disable or constrain markdown image rendering if you don’t need it.
  • Log and alert on unusual egress patterns (e.g., unique domains or parameters).

3) Minimize privileges and scope

  • Give agents the least access required—specific folders, specific repositories, specific CRM fields.
  • Use read‑only scopes where possible; avoid “search everything” permissions.
  • Issue short‑lived, scoped credentials per task or per session. Rotate secrets frequently.
  • Keep secrets out of accessible contexts. Never store API keys in repos or broadly shared drives.

4) Add a review gate for side‑effectful actions

  • Switch agents to “review mode” for actions that send emails, modify tickets, or update records.
  • Require human-in-the-loop approval or policy checks before executing tool calls with side effects.
  • Batch operations for review; use templates to reduce free‑form model output.

5) Constrain tool use with declarative policies

  • Define explicit tool schemas and guardrails: what inputs are allowed, what outputs are acceptable, and what conditions must be met.
  • Reject tool calls that don’t match schema or that reference unexpected resources (e.g., files outside an allowlist).
  • Enforce rate limits and quotas per tool and per user.

6) Segment and sandbox agent memory

  • Separate system prompts from user content and retrieved data. Label and isolate them.
  • Avoid writing arbitrary user content into persistent memory. Use vetted summaries or embeddings instead.
  • Clear memory between contexts or sessions unless you can verify its integrity.

7) Harden connectors and knowledge sources

  • Curate indexes: only include vetted, internal, non-sensitive sources for retrieval.
  • Maintain allowlists for file types and repositories that agents can access.
  • Monitor and review new sources added by users. Default to “no access” until approved.

8) Scan for secrets and sensitive data at multiple layers

  • Run DLP and secret scanning on:
  • Files before ingestion
  • Repositories and ticket systems
  • Agent outputs prior to display or dispatch
  • Combine pattern-based detection (regex for API keys) with ML-based classifiers for PII and PHI.

9) Log everything—and actually look

  • Capture structured logs of prompts, tool calls, resources accessed, and outbound requests.
  • Send logs to your SIEM and set alerts for anomalies (e.g., sudden broad drive searches, unusual CRM queries).
  • Tag logs with user identity, session, and resource identifiers for forensics.

10) Red team your agents before attackers do

  • Build adversarial corpora: documents, tickets, and emails seeded with benign‑looking but malicious instructions.
  • Automate regression tests to ensure new models or prompts don’t re‑open old holes.
  • Use OWASP LLM Top 10 and MITRE ATLAS as test design guides.

11) Do vendor due diligence

Ask your AI and agent vendors:

  • Do you support egress controls and link/image sanitization?
  • Can we define domain allowlists and output policies?
  • How do you isolate system prompts and strip unsafe content from retrieved documents?
  • What is your disclosure and patching process for injection vectors?
  • Do you offer tenant‑level toggles to disable risky features (e.g., external image rendering)?

If a vendor can’t answer, that’s your signal to limit integrations or keep the agent in a low‑privilege sandbox.

What vendors did—and what’s still needed

Zenity reported their findings privately. OpenAI and Microsoft shipped mitigations to block the demonstrated paths. That’s good. But this problem won’t be “fixed” with a single patch. Just like SQL injection required parameterized queries and defense‑in‑depth, prompt injection requires architectural controls.

What’s needed next:

  • Safer defaults: image/link rendering off by default; explicit opt‑in with guardrails
  • Standardized policies: declarative constraints for tool use and output
  • Stronger provenance: signed contexts and content labels so agents can tell instructions from data
  • Better evaluation: red‑team suites organizations can run continuously, not just during launch

If you’re building internal agents, you can start implementing most of the above today.

Strategic implications: govern agents like apps

Here’s the mindset shift I’m seeing in mature teams:

  • Agents are apps. Give them owners, SLAs, and change control.
  • Prompts are code. Version them, review them, and test them.
  • Context is supply chain. Vet and monitor every source that flows into the agent.
  • Policies are product. Treat egress controls, DLP, and approvals as core features, not bolt‑ons.

This doesn’t kill velocity. It protects it. When the inevitable prompt injection attempt happens, you’ll see it, contain it, and keep shipping.

Quick checklist: secure your AI agents this quarter

  • Turn off markdown image rendering or force all URLs through a safe proxy
  • Enforce domain allowlists and block parameterized outbound requests
  • Scope connectors to least privilege; remove “search everything”
  • Add human review for any action that sends or modifies data
  • Strip or sanitize untrusted content from docs, tickets, and emails
  • Scan outputs for secrets and PII before they leave your environment
  • Log prompts, tool calls, and egress; alert on anomalies
  • Red team with adversarial test content; fix until tests pass
  • Pressure vendors for safer defaults and administrative controls

Frequently asked questions (FAQ)

Q: What is a zero‑click prompt injection attack?
A: It’s when an attacker hides instructions in content an AI agent will process—like a document, email, or ticket—so the agent follows those instructions automatically, without any user clicking a link or running code. Because agents often have tools and connectors, the impact can include data exfiltration, workflow manipulation, and lateral movement.

Q: How can a simple document make ChatGPT leak data?
A: When chatbots ingest files and have access to connected drives or tools, malicious text in the file can tell the agent to look elsewhere and include results in its output. If the agent renders links or images, that output can become a covert channel to exfiltrate data. Vendors are adding filters, but architectural controls—like disabling external rendering and enforcing egress allowlists—are more robust.

Q: Are these specific vulnerabilities patched?
A: Zenity reported their findings to affected vendors. OpenAI and Microsoft implemented mitigations that block the specific techniques shown. That said, prompt injection isn’t a single bug; it’s a class of issues. You should assume attackers will try new phrasings and channels and deploy layered defenses.

Q: Is turning off image rendering enough?
A: It helps, but it’s not sufficient. You also need to: – Proxy and sanitize any external links – Limit tool permissions and data scopes – Add human review for side‑effectful actions – Scan outputs for secrets and PII – Monitor for unusual egress activity

Q: How do I secure agents built with Microsoft Copilot Studio?
A: Start by: – Restricting inboxes and automations to vetted senders and formats – Requiring human approval before emailing or updating records – Scoping CRM access to minimal fields – Sanitizing inbound email content – Logging and alerting on unusual queries or outbound messages Microsoft’s documentation on secure patterns is a useful reference: Copilot Studio overview and governance and Prompt injection guidance.

Q: What is the Model Context Protocol (MCP) and why does it matter?
A: MCP is an open protocol for connecting AI tools and context (like Jira, repos, or databases) to LLMs. It’s powerful—but also expands the attack surface because untrusted content can flow into the agent’s context. Learn more here: MCP specification.

Q: Are Jira tickets and Zendesk messages really a risk?
A: Yes. Many organizations auto‑sync external messages into internal systems like Jira. If your coding agent reads tickets to suggest fixes or close issues, that content becomes a control plane. Treat it as untrusted input: sanitize, restrict, and review before allowing side‑effects.

Q: How can I detect data exfiltration from agents?
A: Combine network, app, and content telemetry: – Force all outbound requests through a proxy and log parameters – Alert on unknown domains or large volumes of unique URLs – Scan agent outputs for secrets, credentials, and PII – Correlate tool calls with data access patterns in your SIEM – Run routine adversarial tests to validate detection

Q: Where can I learn more about prompt injection defenses?
A: Start with: – OWASP Top 10 for LLM ApplicationsMicrosoft—Prompt injection guidanceNIST AI RMFMITRE ATLASGoogle SAIF

The bottom line

AI agents are incredibly useful—and increasingly autonomous. Zenity’s Black Hat research is a wake‑up call: zero‑click prompt injection isn’t a thought experiment. It’s a working set of exploit paths that target the very features we love about agents—connectivity, memory, and automation.

The fix isn’t fear. It’s engineering discipline: – Lock down egress. – Minimize privileges. – Add human gates for risky actions. – Sanitize inputs and outputs. – Monitor and test relentlessly.

Do that, and you can ship powerful AI experiences without handing attackers the keys. If you found this useful, consider subscribing for more deep‑dives on AI security and practical guides your team can implement this quarter.

Discover more at InnoVirtuoso.com

I would love some feedback on my writing so if you have any, please don’t hesitate to leave a comment around here or in any platforms that is convenient for you.

For more on tech and other topics, explore InnoVirtuoso.com anytime. Subscribe to my newsletter and join our growing community—we’ll create something magical together. I promise, it’ll never be boring! 

Stay updated with the latest news—subscribe to our newsletter today!

Thank you all—wishing you an amazing day ahead!

Read more related Articles at InnoVirtuoso

Browse InnoVirtuoso for more!