AI Security’s Great Wall Problem: Why Threat Modeling Must Go Beyond Cloud Infrastructure

What if the strongest walls in your AI stack are protecting the wrong city?

For years, cloud hardening has been the default answer to “How do we secure AI?” It’s sensible, even necessary—but it’s not sufficient. Attackers aren’t queueing up at your perimeter anymore; they’re hitching rides through your AI agents, plugins, data pipelines, and vendor integrations. In fact, according to CyberScoop’s coverage, Palo Alto Networks found that 99% of organizations encountered AI attacks last year. That’s a staggering signal that our current boundaries are wrong.

This isn’t just about models getting hacked. It’s about permissioned outcomes—where an influenced agent can legally do things you’ll later regret—because your system allowed it to. The attack doesn’t look like a shell in your cluster; it looks like a workflow that “successfully” ran.

Let’s unpack why the “Great Wall” mindset fails for AI, how attackers are really operating, and what a holistic, modern AI threat model looks like in practice.

The “Great Wall” Mindset—and Why It Breaks for AI

What the Great Wall gets right

Hardening cloud infrastructure reduces many traditional risks: network exposure, misconfigurations, missing patches, weak secrets handling.
Cloud security baselines, identity and access controls, and workload protections still matter. Keep doing them.

Where the Great Wall fails in AI

AI is a control plane, not just a workload. Your LLM orchestration, agents, and tools initiate actions across SaaS, APIs, and internal systems.
Inputs are mixed-trust. RAG pipelines ingest semi-trusted docs, web pages, emails, and chat content. Once interpreted, these can direct privileged tools.
Outcome risk ≠ intrusion risk. Many incidents originate from:
Telemetry gaps in agent/tool actions
Over-permissive plugins and OAuth grants
Vendor service behavior outside your cloud perimeter
Prompt injection and orchestration hijacks
The blast radius traverses vendors. An attacker manipulates a “safe” integration to cause a high-impact, authorized-but-malicious outcome.

Treating AI as just another app inside your cloud boundary ignores the reality: the AI “brain” is coordinating many other systems—most of which you don’t fully own.

Where AI Attacks Actually Land Today

Permissioned outcomes via mixed-trust inputs

Prompt injection: Untrusted content (docs, URLs, emails, user prompts) can carry instructions that agents follow. See OWASP LLM Top 10 (LLM01).
RAG poisoning: Seeded content influences the model to summarize incorrectly, leak secrets, or trigger tools.
Indirect command execution: The agent’s tool-use policy, not the model’s weights, becomes the exploit path.

The result? The system “works as designed”—it just does the wrong thing for the wrong reasons.

Orchestration and agentic plugins as blast radius

OAuth-scope sprawl: Plugins integrate with Slack, Jira, Google Drive, or GitHub. One overbroad grant equals a huge exfil path.
Tool abstraction masks danger: A single “create_ticket” capability may chain into data access, notifications, and storage you didn’t anticipate.
Supply chain: Model hosts, vector DBs, evaluation services, plus vendor APIs—each link introduces trust you must verify.

Telemetry deserts and blind spots

Many orgs lack end-to-end visibility: model prompts, tool invocations, plugin grants, model decisions, and downstream outcomes.
Logs exist in silos: app logs, SaaS logs, vendor logs, model logs. Without correlation in your SIEM/XDR, anomalies are invisible.
Non-human identities (service accounts, API keys, agent identities) are under-monitored—even as they perform high-privilege actions.

Real-World Patterns: How Attackers Exploit Integrations (Composite Examples)

These are anonymized composites of patterns seen in enterprise environments:

Example 1: Data-to-action hijack
A benign-looking knowledge base article contains “hidden” instructions (prompt injection).
The agent, allowed to “summarize and triage,” reads the doc, then uses its Jira plugin to create tickets granting guest access to a shared drive.
Slack notifications with sensitive context auto-post to a channel that includes external collaborators. No firewall was touched; the outcome was “authorized.”
Example 2: OAuth token pivot
A vendor plugin requests a permissive OAuth scope (“read/write all files”).
A known library issue in the vendor’s codebase leaks a refresh token.
The attacker uses that token to trigger the agent’s file-sync tool, exfiltrating design docs. The AI stack “worked” normally—there was no model breach, only an integration misuse.
Example 3: RAG poisoning into finance
A shared folder ingests semi-trusted vendor PDFs. One file includes crafted content that influences the agent’s interpretation of payment workflows.
The finance assistant, with ERP access, drafts a “routine” wire instruction. A human rubber-stamps it because the agent’s history looks clean.
Subsequent review shows a subtle shift in the agent’s tool use pattern that wasn’t alerted on due to missing baselines.

The pattern is constant: exploitation of trust boundaries and permissions, not kernel exploits in your cloud VMs.

Rethinking AI Threat Modeling for Today’s Risk

Treat AI as a control plane

Identify every place the AI can take action: tools, plugins, API calls, task runners, automations.
Map “critical outcomes” (funds transfer, repo change, HR update, data movement) back to the tool invocations that enable them.
Use “assume breach” on inputs: treat all external content as adversarial until proven otherwise.

Model across the whole lifecycle

Data pipelines: collection, labeling, storage, RAG indexing, TTL/retention, provenance.
Model and orchestration: prompts, policies, tool definitions, chain-of-thought handling, safety layers.
Evaluation: red teaming, adversarial testing, regression on safety/evasion.
Dependencies: plugins/OAuth, vector DBs, model hosts, vendor services, CI/CD artifacts, secrets management.

Helpful references: – OWASP Top 10 for LLM Applications – NIST AI Risk Management Framework – MITRE ATLAS for adversarial TTPs against AI

Assume breach—everywhere

Mixed-trust inputs? Assume malicious.
Vendor plugins? Assume over-permissioned until proven safe.
Non-human identities? Assume compromised credentials at some point.

Core Principles and Controls That Actually Work

Zero Standing Privileges (ZSP) for agents

No permanent high-privilege access. Grant least privilege just-in-time, with time-bound, task-scoped permissions.
Require human approvals or policy gates for sensitive scopes (e.g., source code write, finance actions).
Rotate and short-lifetime tokens for all agent identities.

Good background on ZSP: CyberArk: What is Zero Standing Privilege?

Just-in-time access and policy-based elevation

Use an “access broker” or policy proxy in front of tools. Enforce:
Who/what is requesting (agent identity, run ID)
Why (task context, prompt/rationale fingerprints)
What (specific resource, operation)
When (TTL, change window)
Log every elevation and bind it to the session and prompt context.

Constrain and sandbox tool use

Declarative tool policies: allowed operations, rate limits, resource scopes, and data redaction rules.
Sandboxed runners for risky actions (file system, code execution, browsing).
Required confirmations for compound actions:
“If action affects more than N resources or a protected dataset, stop and require human sign-off.”

Bring non-human identities into the SOC

Treat agent/service identities like privileged accounts:
Unique identities per agent and environment
Centralized secrets management
Behavior baselining and anomaly detection
Stream agent/tool logs to SIEM/XDR; correlate with SaaS and cloud logs.
Alert on sensitive patterns: new OAuth grants, scope expansions, token reuse from new geos, abnormal tool sequences.

References: – NIST SP 800-207: Zero Trust Architecture – OWASP OAuth 2.0 Security Cheat Sheet

Red team and evaluate continuously

Test against prompt injection, jailbreaks, data exfil, tool misuse, and policy bypasses.
Codify adversarial tests in CI/CD for prompts and tool policies.
Align with OWASP LLM Top 10 categories.
Map detections against MITRE ATLAS techniques.

Dependency and supply chain hygiene

Apply SLSA-style provenance for datasets, prompts, model artifacts, and tool definitions. See SLSA.
Sign and verify artifacts with Sigstore where possible.
Lock down plugin sources; run SBOMs for AI components and libraries.
Vet vendors for security posture and log availability. Require access to relevant telemetry.

Instrument the stack for observability

Collect: prompts, responses, tool invocations, decisions, denials, policy reasons, OAuth events, and downstream results.
Use OpenTelemetry to standardize traces and spans across microservices, model gateways, and tool runners.
Add semantic context: agent name, run ID, task type, risk score, and escalation flags.

Data minimization and provenance

Strip PII and secrets from prompts and logs where not needed.
Mark trust levels on inputs; annotate source and timestamp.
Maintain lineage for what data influenced which outcome.

Detection Engineering for AI Workflows

Build behavioral baselines

Model “normal” for each agent:
Average tools per task, typical tool order, usual resource scopes
Typical OAuth scopes and frequency of consent prompts
Normal time-of-day and task duration ranges
Flag drifts: sudden tool explosions, new tools, unusual target resources.

High-signal alerts that catch real abuse

New or expanded OAuth scopes for an agent/plugin
First-time access to sensitive repository/dataset by an agent
Tool invocation spikes or out-of-hours “automation storms”
Denials followed by success via a different route (possible policy probing)
RAG index changes followed by immediate anomalous actions
Prompt similarity to known injection patterns (e.g., instruction to exfiltrate, override policies)

Correlate across layers

Merge model gateway logs, tool runner traces, SaaS audit logs, and IAM events in your SIEM/XDR.
Build detections that consider cause-and-effect:
Suspicious doc ingested → agent reads doc → tool called with unusual parameters → data movement out of org

Incident Response for AI-Specific Failures

Containment

Pause the offending agent or workflow.
Revoke OAuth tokens and rotate keys for implicated tools and service principals.
Disable or isolate risky plugins/integrations.
Blocklist malicious inputs (documents, URLs, data sources) and reindex RAG.

Eradication and recovery

Purge poisoned content from data stores and caches.
Rebuild indexes and rerun evaluations for critical tasks.
Patch or reconfigure vendor plugins; tighten scopes, disable auto-consent.
Add missing observability and controls revealed by the incident.

Post-incident improvements

Add adjudication gates: “Two-person rule” for irreversible actions (e.g., financials, IAM changes).
Update detection content for the observed TTPs.
Expand red team playbooks to replicate the incident path and validate fixes.

Architecture Patterns That Make AI Safer

Policy enforcement proxy for tools

All tool calls route through a policy engine:
Validate agent identity, prompt context, requested operation, resource scope
Attach risk scores and pass/fail reasons
Produce rich decision logs for forensics and tuning.

Sandboxed runners and data diodes

Isolate code execution, browsing, and file operations in constrained environments.
Data “diode” patterns for high-to-low transfers: explicit transformations, redaction, and approvals before crossing boundaries.

Signed prompts, plans, and tool definitions

Sign critical prompts, playbooks, and tool manifests; verify at runtime.
Maintain a registry of approved tools with version pinning and SBOMs.

Human-in-the-loop for dangerous actions

Insert review steps for:
Large data exports
Privileged repo changes
Security or IAM modifications
Financial commitments
Provide diffs and rationales that are easy to audit.

Metrics and KPIs That Matter

Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR) for AI incidents
Percentage of agent actions executed with ZSP/JIT vs. standing privilege
OAuth scope sprawl: average scopes per plugin; number of high-risk scopes
Coverage: percentage of agent/tool events landing in SIEM with correlation
Prompt injection catch rate in pre-prod evaluations vs. prod blocks
False positive/negative rates for AI-specific detections
RAG data hygiene: percentage of sources with provenance, TTL adherence

A Pragmatic 30/60/90-Day Roadmap

Next 30 days
Inventory agents, tools, plugins, OAuth scopes, and data sources.
Centralize logs for model gateway, tool invocations, and SaaS audit trails.
Define “critical outcomes” and map the enabling tools and scopes.
Turn off unused plugins and trim obvious over-permissions.
Days 31–60
Implement ZSP/JIT for at least one high-impact agent.
Deploy a policy proxy in front of critical tools.
Add first-gen detections: new scopes, sensitive resource first-access, tool bursts.
Kick off a focused AI red team against prompt injection and tool misuse.
Days 61–90
Expand ZSP/JIT to all production agents; enforce human-in-loop for critical actions.
Instrument OpenTelemetry traces across orchestration and tool runners.
Integrate non-human identities into privileged access monitoring.
Establish signed prompt/playbook registry and SLSA-aligned provenance for data and artifacts.

How This Reframes Your Security Strategy

From perimeter to path: Stop guarding only the wall; guard the path from input to action.
From model breaches to outcome safety: Your goal is safe, auditable outcomes—even when inputs are hostile.
From trust-by-default to trust-by-proof: Every tool call and integration should be justified, scoped, logged, and reviewable.
From infrastructure-only to ecosystem defense: Vendors, plugins, data sources, and identities are all part of your threat model.

Helpful resources to go deeper: – CyberScoop: AI threat modeling needs to go beyond cloud – OWASP Top 10 for LLM Applications – NIST AI Risk Management Framework – MITRE ATLAS – OpenTelemetry – SLSA – Sigstore – NIST SP 800-207: Zero Trust

FAQ

Q: Isn’t cloud hardening still the top priority? A: It’s foundational, but not sufficient. The most damaging AI incidents increasingly arise from orchestration and integrations—where the AI system has permission to do harm without any “intrusion.” Harden cloud, but also secure the AI control plane and its dependencies.

Q: How do I explain “permissioned outcomes” to executives? A: Say: “The system didn’t get hacked—it did exactly what it was allowed to do, but for the wrong reasons.” That’s why we add just-in-time access, human approvals, and policy checks for sensitive actions.

Q: Are prompt injections really that dangerous? A: Yes, especially when agents have tools. Injection turns a harmless-looking document or web page into a set of instructions the agent may follow. The risk compounds when the agent has write or export capabilities.

Q: Do I have to build my own agent policy proxy? A: Not necessarily. You can start with your API gateway, a service mesh, or an existing policy engine (e.g., OPA/Gatekeeper) and extend it to understand agent identity, tool types, and scopes. The key is to centralize policy enforcement and logging.

Q: How do I monitor non-human identities effectively? A: Give each agent a unique, short-lived identity; centralize secrets; feed all actions to your SIEM/XDR; and baseline normal behavior. Alert on scope changes, new resource access, and unusual tool sequences.

Q: What about vendor risk for plugins and model hosts? A: Require security attestations, logging access, SBOMs, and least-privilege OAuth scopes. Prefer vendors that support event streaming and revocation APIs. Treat vendors as part of your extended SOC telemetry fabric.

Q: Can we rely on model alignment and safety filters to stop these attacks? A: Safety layers help, but they’re not sufficient. You need policy enforcement at the tool layer, ZSP/JIT, provenance, and robust detection. Defense-in-depth wins.

Q: What is the minimum viable set of controls to reduce blast radius now? A: Inventory tools and scopes, remove unused plugins, enforce JIT for sensitive actions, add human approval gates, and centralize logs for model/tool events. Then iterate with detection and red team feedback.

The Clear Takeaway

AI security fails when we build a taller wall around the wrong thing. The real battleground isn’t only your cloud perimeter—it’s the AI control plane and the ecosystem it orchestrates. Shift your threat model from “keep them out” to “keep outcomes safe.” Assume hostile inputs, enforce zero standing privileges, gate high-impact actions, and make every tool call observable. When you defend the path from prompt to permissioned action, you turn AI from a sprawling attack surface into a disciplined, auditable engine for your business.

Discover more at InnoVirtuoso.com

I would love some feedback on my writing so if you have any, please don’t hesitate to leave a comment around here or in any platforms that is convenient for you.

For more on tech and other topics, explore InnoVirtuoso.com anytime. Subscribe to my newsletter and join our growing community—we’ll create something magical together. I promise, it’ll never be boring!

Stay updated with the latest news—subscribe to our newsletter today!

Thank you all—wishing you an amazing day ahead!