The God Mode Problem
Today’s agents are granted unrestricted access to tools, APIs, and infrastructure. When you connect Claude Desktop or Cursor to an MCP server, the agent receives: This creates systemic gaps:No Audit Trail
Actions taken by agents are indistinguishable from human actions in logs. No way to trace which agent did what.
No Revocation
Once an agent has credentials, there’s no standard way to revoke them without rotating the entire API key.
No Authorization Granularity
Access is all-or-nothing at the API key level. Can’t grant “read repos” without also granting “delete repos.”
Compliance Blind Spots
SOC 2, GDPR, HIPAA, and SOX requirements are unmet for agentic actions. Auditors can’t distinguish agent activity.
rm -rf /.
Threat Catalog
1. Indirect Prompt Injection
Real-World Example: GeminiJack (2024) Security researchers at Embrace the Red demonstrated that attackers could embed adversarial prompts in Google Docs. When Google’s Gemini AI accessed these documents, it executed the attacker’s instructions instead of the user’s intent. Attack Vector:- No input validation: The malicious prompt is semantically valid text
- No signature verification: The PDF itself is not tampered with
- No anomaly detection: Email sending is a legitimate agent capability
Result: Even if the agent believes it should send the email, AIP blocks it based on policy.
2. Privilege Escalation
Attack Scenario: Agent Chaining Example:- Agent reads Slack messages (allowed)
- Finds “Meeting at 3pm with Client X” (allowed)
- Accesses calendar API to find client contact info (allowed)
- Composes email to client (allowed)
- Escalation: Instead of sending via company email, posts message to external API
- Each individual step is authorized
- No visibility into the chain of actions
- API keys don’t distinguish between “read calendar for scheduling” vs “read calendar to scrape contacts”
Capability Manifests
Capability Manifests
Agents declare upfront what they need:The escalation fails because
calendar_access is not in the manifest.Audit Trail Correlation
Audit Trail Correlation
Every tool call is logged with Forensic analysis can detect escalation attempts even if they fail.
session_id and agent_id:Tool-Specific Rate Limits
Tool-Specific Rate Limits
Prevent agents from brute-forcing access:
3. Data Exfiltration
Attack Scenarios:- Exfiltration via Summarization
- Exfiltration via Tool Arguments
- Exfiltration via Response Manipulation
Scenario: Agent “summarizes” proprietary code by posting it to an external API.AIP Mitigation:
4. Session Hijacking
Attack Vector:- Agent’s API key is leaked (commit to GitHub, log file exposure)
- Attacker uses key to make tool calls
- Attacker’s actions appear as legitimate agent activity
Short-Lived AATs
Agent Authentication Tokens expire in 5 minutes by default. Even if stolen, the window of exploitation is narrow.
Session Binding
Tokens are cryptographically bound to the process/host:A token stolen from one process cannot be used in another.
Nonce-Based Replay Prevention
Each token includes a unique nonce. Implementations track used nonces:Replaying a stolen token fails immediately.
5. Consent Fatigue & Shadow AI
Example:“Allow GitHub access” grantsOrganizational Risk:repo:delete, not justrepo:read
- Developers run local Copilot instances with production credentials
- No visibility into which agents are running or what they access
- Compliance auditors cannot trace agent activity
Threat Matrix
| Threat | Standard MCP | API Keys | AIP |
|---|---|---|---|
| Indirect Prompt Injection | ⚠️ Vulnerable | ⚠️ Vulnerable | ✅ Policy blocks unauthorized intent |
| Privilege Escalation | ⚠️ Unrestricted | ⚠️ Scope-level only | ✅ Per-tool allowlist + audit trail |
| Data Exfiltration | ⚠️ Unrestricted egress | ⚠️ Unrestricted egress | ✅ DLP scanning + argument validation |
| Session Hijacking | ⚠️ Long-lived credentials | ⚠️ Rotate manually | ✅ Short-lived AATs + binding |
| Consent Fatigue | ⚠️ All-or-nothing | ⚠️ Broad scopes | ✅ Explicit capability manifests |
| Shadow AI | ⚠️ No visibility | ⚠️ No audit | ✅ Centralized policy + audit |
| Compliance Gaps | ⚠️ Manual | ⚠️ Partial | ✅ SOC 2, GDPR, HIPAA, SOX ready |
Defense-in-Depth Principles
AIP implements multiple independent security layers. An attacker must bypass all of them:- Identity Layer: Valid AAT with correct signature
- Temporal Layer: Token not expired or revoked
- Capability Layer: Tool in allowed_tools list
- Argument Layer: Parameters match regex constraints
- Rate Layer: Not exceeding call limits
- Data Layer: No sensitive data in request/response
Why Existing Solutions Fall Short
- OAuth Scopes
- API Keys
- Service Mesh (Istio)
What OAuth Provides:
- User consent for broad permissions
- Token-based authentication
- Scope-level authorization (“repo access”)
- ❌ Runtime authorization (scopes are grant-time)
- ❌ Per-action granularity (“repos.get with org:X”)
- ❌ Audit trail of tool invocations
- ❌ DLP scanning of responses
Real-World Impact
Without AIP, organizations face:Regulatory Risk
Example: Healthcare provider using AI agent to process patient records
- HIPAA requires audit trail of who accessed PHI
- Agent actions logged as “API user” (not individually identified)
- Result: Non-compliance, potential fines
Insider Threat
Example: Developer’s compromised agent
- Agent credentials leaked in Git commit
- Attacker uses agent to exfiltrate customer database
- Result: Data breach, no way to trace which “agent” did it
Supply Chain Attack
Example: Malicious MCP server
- Developer installs third-party MCP server from npm
- Server contains backdoor that exfiltrates environment variables
- Result: Credentials stolen, infrastructure compromised
Accidental Deletion
Example: Agent misinterprets user intent
- User: “Clean up old branches”
- Agent: [Deletes
mainbranch] - Result: Production outage, no policy to prevent it
Next Steps
Architecture
Understand how AIP’s two-layer design mitigates these threats
Layer 1: Identity
Learn how cryptographic identity prevents impersonation
Layer 2: Enforcement
Explore policy engine internals and evaluation flow
Write a Policy
Create your first AIP policy to block these threats