Overview
The LLM Analysis package provides true agentic security analysis using large language models. Unlike template-based tools, it reasons about vulnerabilities contextually, generates working exploits, and creates intelligent patches.
Purpose
AI-powered autonomous security with:
- LLM-powered analysis: No heuristics, genuine reasoning
- Context-aware exploits: Generated from actual code, not templates
- Intelligent patching: Understands security context
- Multi-model support: Claude, GPT-4, Ollama (DeepSeek/Qwen)
- Automatic fallback: Cost optimization and reliability
Architecture
packages/llm_analysis/
├── agent.py # Autonomous security agent
├── crash_agent.py # Crash analysis agent
├── orchestrator.py # Workflow orchestration
└── llm/
├── client.py # LLM client with fallback
├── config.py # Model configuration
└── providers.py # Provider implementations
Quick Start
Analyze Findings
# Analyze SARIF findings with LLM
python3 -m packages.llm_analysis.agent \
--repo /path/to/code \
--sarif out/combined.sarif \
--max-findings 10
Generate Exploits
# Analyze + generate exploits for exploitable findings
python3 -m packages.llm_analysis.agent \
--repo /path/to/code \
--sarif out/combined.sarif \
--generate-exploits
Create Patches
# Analyze + create patches
python3 -m packages.llm_analysis.agent \
--repo /path/to/code \
--sarif out/combined.sarif \
--generate-patches
Python API
Autonomous Security Agent
from pathlib import Path
from packages.llm_analysis import AutonomousSecurityAgentV2
# Initialize agent
agent = AutonomousSecurityAgentV2(
repo_path=Path("/path/to/code"),
out_dir=Path("out/analysis")
)
# Analyze SARIF findings
results = agent.analyze_sarif(
sarif_path=Path("out/combined.sarif"),
max_findings=10,
generate_exploits=True,
generate_patches=True
)
# Review results
for result in results:
print(f"Finding: {result['finding_id']}")
print(f"Exploitable: {result['exploitable']}")
print(f"Score: {result['exploitability_score']}")
if result['exploit_code']:
print(f"Exploit: {result['exploit_code'][:100]}...")
Analyze Single Vulnerability
from packages.llm_analysis.agent import VulnerabilityContext
# Create vulnerability context
context = VulnerabilityContext(
finding={
"finding_id": "sqli-001",
"rule_id": "sql-injection",
"file": "src/api/users.py",
"startLine": 45,
"endLine": 47,
"message": "SQL injection vulnerability",
"snippet": 'query = f"SELECT * FROM users WHERE id={user_id}"'
},
repo_path=Path("/path/to/code")
)
# Read vulnerable code
context.read_vulnerable_code()
# Analyze with LLM
analysis = agent.analyze_vulnerability(context)
print(f"Exploitable: {context.exploitable}")
print(f"Analysis: {context.analysis}")
LLM Client
from packages.llm_analysis.llm import LLMClient, LLMConfig
# Initialize with multi-model support
config = LLMConfig(
primary_model="claude-3-7-sonnet-20250219",
fallback_model="gpt-4o",
enable_local_fallback=True,
local_model="deepseek-r1:14b"
)
client = LLMClient(config)
# Query with automatic fallback
response = client.query(
system_prompt="You are a security analyst.",
user_prompt="Analyze this SQL injection vulnerability...",
temperature=0.3
)
print(response['content'])
print(f"Model used: {response['model']}")
print(f"Cost: ${response['cost']:.4f}")
Core Classes
AutonomousSecurityAgentV2
Main agent for vulnerability analysis.
class AutonomousSecurityAgentV2:
def __init__(
self,
repo_path: Path,
out_dir: Path,
llm_config: Optional[LLMConfig] = None
)
def analyze_sarif(
self,
sarif_path: Path,
max_findings: int = 10,
generate_exploits: bool = False,
generate_patches: bool = False
) -> List[Dict[str, Any]]
def analyze_vulnerability(
self,
context: VulnerabilityContext
) -> Dict[str, Any]
def generate_exploit(
self,
context: VulnerabilityContext
) -> Optional[str]
def generate_patch(
self,
context: VulnerabilityContext
) -> Optional[str]
VulnerabilityContext
Complete context for vulnerability analysis.
Repository path for reading source code
Whether vulnerability is exploitable
LLMClient
Multi-model LLM client with fallback.
class LLMClient:
def __init__(self, config: LLMConfig = None)
def query(
self,
system_prompt: str,
user_prompt: str,
temperature: float = 0.3,
max_tokens: int = 4000
) -> Dict[str, Any]
def query_with_fallback(
self,
system_prompt: str,
user_prompt: str,
**kwargs
) -> Dict[str, Any]
Model that generated response
Token usage (prompt, completion, total)
LLMConfig
Configuration for multi-model setup.
primary_model
str
default:"claude-3-7-sonnet-20250219"
Primary model to use
Fallback if primary fails
Enable local model fallback (Ollama)
local_model
str
default:"deepseek-r1:14b"
Local model name for Ollama
Max retry attempts per model
Supported Models
Cloud Models
| Provider | Model | Context | Cost (per 1M tokens) |
|---|
| Anthropic | claude-3-7-sonnet-20250219 | 200K | 3.00/15.00 |
| Anthropic | claude-3-5-sonnet-20241022 | 200K | 3.00/15.00 |
| OpenAI | gpt-4o | 128K | 2.50/10.00 |
| OpenAI | gpt-4o-mini | 128K | 0.15/0.60 |
Local Models (Ollama)
| Model | Size | Performance |
|---|
| deepseek-r1:14b | 14B | Excellent reasoning |
| qwen2.5:14b | 14B | Good general purpose |
| qwen2.5-coder:14b | 14B | Code specialized |
Configuration
Environment Variables
# API Keys
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
# Ollama (local)
export OLLAMA_BASE_URL=http://localhost:11434
# Model selection
export LLM_PRIMARY_MODEL=claude-3-7-sonnet-20250219
export LLM_FALLBACK_MODEL=gpt-4o
export LLM_LOCAL_MODEL=deepseek-r1:14b
Model Selection Strategy
- Try primary model (Claude 3.7 Sonnet)
- If fails, try fallback (GPT-4o)
- If both fail, try local (DeepSeek R1)
- Retry each model up to 3 times
Analysis Output
Vulnerability Analysis
{
"finding_id": "sqli-001",
"rule_id": "sql-injection",
"file": "src/api/users.py",
"startLine": 45,
"exploitable": true,
"exploitability_score": 0.95,
"analysis": {
"vulnerability_type": "SQL Injection",
"severity": "critical",
"attack_vector": "Network",
"attack_complexity": "Low",
"privileges_required": "None",
"reasoning": "User input directly interpolated into SQL query...",
"exploitation_difficulty": "Easy",
"impact": "Complete database compromise"
}
}
Exploit Generation
# Generated exploit code
exploit = """
import requests
# SQL injection exploit
url = "http://target.com/api/users"
payload = "1' OR '1'='1' UNION SELECT username,password FROM users--"
response = requests.get(url, params={'id': payload})
print(response.json())
"""
Patch Generation
# Generated patch
patch = """
# Before (vulnerable)
query = f"SELECT * FROM users WHERE id={user_id}"
# After (patched)
from sqlalchemy import text
query = text("SELECT * FROM users WHERE id=:user_id")
result = db.execute(query, {'user_id': user_id})
"""
Dataflow Analysis
The agent supports advanced dataflow analysis from CodeQL:
# Vulnerability with dataflow
context = VulnerabilityContext(
finding={
"has_dataflow": True,
"dataflow_path": {
"source": {
"file": "src/api/routes.py",
"line": 23,
"message": "User input from request.args"
},
"sink": {
"file": "src/db/queries.py",
"line": 45,
"message": "SQL query execution"
},
"steps": [
{"file": "src/api/routes.py", "line": 25},
{"file": "src/api/validation.py", "line": 12},
{"file": "src/db/queries.py", "line": 43}
]
}
},
repo_path=Path("/path/to/code")
)
# Agent will analyze the complete dataflow path
analysis = agent.analyze_vulnerability(context)
Integration
With Static Analysis
from packages.static_analysis import main as scan_repo
from packages.llm_analysis import AutonomousSecurityAgentV2
# 1. Scan repository
scan_repo() # Generates SARIF
# 2. Analyze with LLM
agent = AutonomousSecurityAgentV2(
repo_path=Path("/path/to/code"),
out_dir=Path("out/analysis")
)
results = agent.analyze_sarif(
sarif_path=Path("out/combined.sarif"),
generate_exploits=True,
generate_patches=True
)
With CodeQL
from packages.codeql import CodeQLAgent
from packages.llm_analysis import AutonomousSecurityAgentV2
# 1. Run CodeQL
codeql = CodeQLAgent(repo_path=Path("/path/to/code"))
result = codeql.run()
# 2. Analyze CodeQL findings with LLM
for sarif_file in result.sarif_files:
agent.analyze_sarif(sarif_file, generate_exploits=True)
Analysis Speed
- Per finding: 10-30 seconds (depends on model)
- Batch (10 findings): 3-5 minutes
- With exploits: +20-40 seconds per exploitable finding
Cost Estimates
- Claude 3.7 Sonnet: ~$0.05-0.15 per finding
- GPT-4o: ~$0.03-0.10 per finding
- Local (Ollama): $0.00 (free, but slower)
Best Practices
- Start with max_findings=10 for initial assessment
- Enable exploits for critical findings only
- Use local models for cost-free experimentation
- Review patches before applying (AI can make mistakes)
- Combine with dataflow analysis for best results