Security Scanning

Every skill package is automatically scanned through a 6-stage security pipeline before being published to the Tank registry. This pipeline detects malicious code, credential leaks, prompt injection, and supply chain risks.

Overview

Tank’s security scanner is implemented in Python (python-api/) and runs automatically during tank publish. It produces a verdict and audit score based on findings from all stages.

The scanner is inspired by the ClawHavoc incident — 341 malicious skills (12% of a major marketplace) that exfiltrated credentials and executed arbitrary code. Tank prevents this through mandatory scanning and permission enforcement.

6-Stage Pipeline

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   stage0    │───▶│   stage1    │───▶│   stage2    │───▶│   stage3    │───▶│   stage4    │───▶│   stage5    │
│   INGEST    │    │  STRUCTURE  │    │   STATIC    │    │  INJECTION  │    │   SECRETS   │    │   SUPPLY    │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Each stage runs independently and can succeed, fail, or error without blocking subsequent stages.

Stage 0: Ingest

File: python-api/lib/scan/stage0_ingest.py Purpose: Download, extract, and validate tarball structure Checks:

Download from authorized domains only (Supabase storage)
Validate tarball size (max 50MB compressed, 50MB extracted)
Detect zip bombs (compression ratio >100x)
Reject symlinks, hardlinks, path traversal
Reject binary executables (.exe, .so, .dll, .pyc, .class)
Compute SHA-256 hash for each file

Critical Findings:

download_failed — Cannot retrieve tarball
zip_bomb — Compression ratio exceeds 100x
path_traversal — Archive contains dangerous paths like ../../../etc/passwd
blocked_file_type — Binary or executable file detected
size_exceeded — Extracted size exceeds 50MB

Example:

# From stage0_ingest.py
MAX_TARBALL_SIZE = 50 * 1024 * 1024  # 50MB
MAX_EXTRACTED_SIZE = 50 * 1024 * 1024  # 50MB
MAX_COMPRESSION_RATIO = 100  # decompressed/compressed

BLOCKED_EXTENSIONS = {
    ".exe", ".so", ".dll", ".dylib", ".wasm",
    ".class", ".pyc", ".pyo", ".jar", ".war",
    ".bin", ".dat",
}

Stage 1: Structure

File: python-api/lib/scan/stage1_structure.py Purpose: Validate file structure and detect anomalies Checks:

Required files present (skills.json, SKILL.md)
File count within limits (max 1000 files)
No hidden files in unexpected locations
Valid file extensions (whitelist)
Manifest schema validation

High Findings:

missing_manifest — No skills.json found
missing_skill_md — No SKILL.md found
file_count_exceeded — More than 1000 files

Medium Findings:

suspicious_hidden_file — Hidden file outside expected locations (e.g., .git/)
unexpected_extension — File extension not in allowlist

Stage 2: Static Analysis

File: python-api/lib/scan/stage2_static.py (~550 lines, largest stage) Purpose: Static code analysis via AST and pattern matching Checks:

Python AST analysis (detects eval, exec, compile, __import__)
JavaScript pattern matching (obfuscated code, dynamic requires)
Dangerous function calls (shell execution, file operations)
Network operations (HTTP requests, socket connections)
Subprocess spawning
Dynamic code generation
Obfuscation detection (base64, hex encoding)
Suspicious imports (e.g., ctypes, subprocess, socket)

Critical Findings:

arbitrary_code_exec — Use of eval(), exec(), compile()
shell_injection_risk — Unsafe shell command construction
unsafe_deserialization — Use of pickle.loads() or similar

High Findings:

obfuscated_code — Base64-encoded payloads or hex strings
dynamic_import — Use of __import__() or importlib
suspicious_network — Network calls to non-declared domains

Medium Findings:

subprocess_spawn — Subprocess usage without declaration
filesystem_write — File writes to sensitive paths

Stage 3: Injection Detection

File: python-api/lib/scan/stage3_injection.py Purpose: Detect prompt injection and manipulation attempts Checks:

Prompt injection patterns (“Ignore previous instructions”)
System prompt extraction attempts
Role confusion attacks (impersonating user/assistant)
Instruction override patterns
Multi-turn attack sequences
Unicode homoglyphs (lookalike characters)
Hidden instructions (whitespace tricks, zero-width chars)

Critical Findings:

prompt_injection — Direct injection pattern detected
system_prompt_extraction — Attempts to extract system prompts

High Findings:

role_confusion — Instructions to change agent role
instruction_override — Attempts to override core instructions

Medium Findings:

unicode_homoglyph — Suspicious lookalike characters
hidden_instruction — Zero-width or invisible characters

Stage 4: Secrets Scanning

File: python-api/lib/scan/stage4_secrets.py Purpose: Detect hardcoded secrets and credentials Checks:

API keys (OpenAI, Anthropic, AWS, GitHub, Stripe, etc.)
AWS credentials (access keys, secret keys)
Private keys (RSA, SSH, PGP)
Database connection strings
JWT tokens
Generic high-entropy strings (potential secrets)

Critical Findings:

hardcoded_api_key — API key found in code
hardcoded_aws_credentials — AWS credentials embedded
private_key_embedded — RSA/SSH private key in code

High Findings:

database_credentials — DB connection string with password
jwt_token — Hardcoded JWT

Medium Findings:

high_entropy_string — Potential secret (base64, hex)

Stage 5: Supply Chain

File: python-api/lib/scan/stage5_supply.py Purpose: Analyze supply chain risks Checks:

Dependency analysis (NPM, PyPI, etc.)
Known vulnerable packages (CVE database)
Typosquatting detection (e.g., reqests vs requests)
Dependency confusion attacks
Unpinned dependencies
Deprecated packages

High Findings:

known_vulnerability — Dependency has published CVE
typosquatting — Dependency name closely matches popular package

Medium Findings:

unpinned_dependency — Version range too broad (e.g., *)
deprecated_package — Dependency is marked deprecated

Low Findings:

outdated_dependency — Newer version available

Verdict Rules

After all stages complete, findings are aggregated and a verdict is computed: From python-api/lib/scan/verdict.py:

def compute_verdict(stage_results: List[StageResult]) -> ScanVerdict:
    """Compute the final verdict from stage results.

    Rules:
    - 1+ critical findings → FAIL
    - 4+ high findings → FAIL
    - 1-3 high findings → FLAGGED
    - Any medium or low findings → PASS_WITH_NOTES
    - No findings → PASS
    """
    # Aggregate all findings
    all_findings = []
    for stage in stage_results:
        all_findings.extend(stage.findings)

    # Count by severity
    critical_count = sum(1 for f in all_findings if f.severity == "critical")
    high_count = sum(1 for f in all_findings if f.severity == "high")
    medium_count = sum(1 for f in all_findings if f.severity == "medium")
    low_count = sum(1 for f in all_findings if f.severity == "low")

    # Apply verdict rules
    if critical_count > 0:
        return ScanVerdict.FAIL

    if high_count >= 4:
        return ScanVerdict.FAIL

    if high_count > 0:
        return ScanVerdict.FLAGGED

    if medium_count > 0 or low_count > 0:
        return ScanVerdict.PASS_WITH_NOTES

    return ScanVerdict.PASS

Verdict Meanings

Verdict	Severity	Can Publish?	Description
`PASS`	Clean	Yes	No findings — perfect security score
`PASS_WITH_NOTES`	Minor	Yes	Only medium/low findings — publishes with warnings
`FLAGGED`	Moderate	Requires review	1-3 high findings — manual review required
`FAIL`	Severe	No	1+ critical OR 4+ high findings — cannot publish

Skills with FAIL verdict cannot be published. You must fix all critical findings and reduce high findings to 3 or fewer before publishing.

Finding Structure

Each finding follows this schema (from python-api/lib/scan/models.py):

class Finding(BaseModel):
    """A single security finding from any stage."""

    stage: str = Field(..., description="Stage that produced this finding (stage0-stage5)")
    severity: Literal["critical", "high", "medium", "low"] = Field(
        ..., description="Severity level"
    )
    type: str = Field(..., description="Finding type e.g. 'prompt_injection', 'shell_injection'")
    description: str = Field(..., description="Human-readable description")
    location: Optional[str] = Field(None, description="File:line or path reference")
    confidence: Optional[float] = Field(None, ge=0.0, le=1.0, description="Confidence score 0-1")
    tool: Optional[str] = Field(None, description="Tool or rule that found this")
    evidence: Optional[str] = Field(None, description="Raw snippet or pattern matched")

Example Finding

{
  "stage": "stage2",
  "severity": "critical",
  "type": "arbitrary_code_exec",
  "description": "Use of eval() detected — can execute arbitrary code",
  "location": "src/main.py:42",
  "confidence": 1.0,
  "tool": "stage2_static",
  "evidence": "eval(user_input)"
}

Audit Score

The audit score (0-10) is computed from findings and stored in skills.lock: Score Calculation:

score = 10.0
score -= (critical_count * 3.0)  # -3 per critical
score -= (high_count * 1.0)      # -1 per high
score -= (medium_count * 0.3)    # -0.3 per medium
score -= (low_count * 0.1)       # -0.1 per low
score = max(0, score)            # Floor at 0

Score Interpretation:

10: Perfect — no findings
9-10: Excellent — minor notes only
7-8: Good — some medium findings
5-6: Fair — multiple medium or 1-2 high
<5: Poor — critical or many high findings

You can require a minimum audit score for dependencies in your skills.json:

{
  "audit": {
    "min_score": 8.0
  }
}

Dependencies with scores below 8.0 will be rejected during tank install.

Scan Response

The full scan response includes all findings, stage results, and metadata:

interface ScanResponse {
  scan_id: string;                    // UUID of stored scan result
  verdict: "pass" | "pass_with_notes" | "flagged" | "fail";
  findings: Finding[];                // All findings from all stages
  stage_results: StageResult[];       // Per-stage results
  duration_ms: number;                // Total scan time
  file_hashes: Record<string, string>; // SHA-256 per file
}

interface StageResult {
  stage: string;                      // "stage0" - "stage5"
  status: "passed" | "failed" | "errored" | "skipped";
  findings: Finding[];
  duration_ms: number;
  error?: string;                     // Error message if status is "errored"
}

Viewing Scan Results

You can view scan results for any published skill:

CLI

# View security report for a skill
tank audit @tank/google-sheets

# Output:
# @tank/[email protected]
# Verdict: PASS_WITH_NOTES
# Audit Score: 9.2/10
#
# Findings (2):
#   [medium] stage2: subprocess_spawn at src/export.py:15
#   [low] stage5: outdated_dependency — [email protected] (latest: 2.31.0)

Web UI

Visit the skill page on the registry:

https://tankpkg.dev/skills/@tank/google-sheets

The security tab shows:

Verdict badge
Audit score
All findings with severity, location, and evidence
Stage-by-stage breakdown

Best Practices

Before Publishing

Run local scan (if available):
```
tank scan --local
```
Review findings before submitting
Fix critical and high findings — These will block publishing
Document medium/low findings — Explain why they’re safe in your README or SKILL.md

Handling Flagged Verdicts

If your skill gets FLAGGED, you’ll need manual review:

Submit for review:
```
tank publish --request-review
```
Provide context — Explain why high findings are false positives
Wait for approval — Registry admins will review and approve/reject

False Positives

Some legitimate code may trigger findings: Example: Dynamic imports for optional dependencies

# This triggers "dynamic_import" (medium severity)
try:
    import numpy as np
except ImportError:
    np = None

Solution: Document this in your README:

## Security Notes

This skill uses dynamic imports for optional dependencies (numpy, pandas).
The scanner flags this as `dynamic_import` (medium severity), but it is
safe because:
- Imports are hardcoded (no user input)
- Used only for optional features
- Falls back gracefully if not installed

Next Steps

Permissions — Declare permissions correctly to avoid findings
Manifest — Configure audit.min_score for dependencies
Lockfile — Audit scores are stored in skills.lock

Get Started

Core Concepts

CLI Commands

Security

Registry

Development

Security Scanning

Security Scanning

Overview

6-Stage Pipeline

Stage 0: Ingest

Stage 1: Structure

Stage 2: Static Analysis

Stage 3: Injection Detection

Stage 4: Secrets Scanning

Stage 5: Supply Chain

Verdict Rules

Verdict Meanings

Finding Structure

Example Finding

Audit Score

Scan Response

Viewing Scan Results

CLI

Web UI

Best Practices

Before Publishing

Handling Flagged Verdicts

False Positives

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Commands

Security

Registry

Development

​Security Scanning

​Overview

​6-Stage Pipeline

​Stage 0: Ingest

​Stage 1: Structure

​Stage 2: Static Analysis

​Stage 3: Injection Detection

​Stage 4: Secrets Scanning

​Stage 5: Supply Chain

​Verdict Rules

​Verdict Meanings

​Finding Structure

​Example Finding

​Audit Score

​Scan Response

​Viewing Scan Results

​CLI

​Web UI

​Best Practices

​Before Publishing

​Handling Flagged Verdicts

​False Positives

​Next Steps

Build docs developers (and LLMs) love

Security Scanning

Overview

6-Stage Pipeline

Stage 0: Ingest

Stage 1: Structure

Stage 2: Static Analysis

Stage 3: Injection Detection

Stage 4: Secrets Scanning

Stage 5: Supply Chain

Verdict Rules

Verdict Meanings

Finding Structure

Example Finding

Audit Score

Scan Response

Viewing Scan Results

CLI

Web UI

Best Practices

Before Publishing

Handling Flagged Verdicts

False Positives

Next Steps