Overview
Aguara uses a 4-layer analysis pipeline that runs sequentially on every scanned file. Each layer catches different attack patterns, and their findings are combined through post-processing to produce a comprehensive security report.
File Discovery → Parallel Analysis → Post-Processing → Report
↓ ↓ ↓ ↓
.md, .json 4 Analyzers Scoring & Terminal/JSON/
.txt, .py in sequence Correlation SARIF/MD
Scanning Workflow
1. File Discovery
Aguara starts by discovering all scannable files in the target directory:
internal/scanner/scanner.go:67-92
func ( s * Scanner ) Scan ( ctx context . Context , root string ) ( * ScanResult , error ) {
// Check if root is a single file (not a directory).
info , err := os . Stat ( root )
if err != nil {
return nil , err
}
if ! info . IsDir () {
// Single-file scan: use the filename as RelPath so target-filtered
// rules (e.g. "*.md", "*.json") can match correctly.
targets := [] * Target {{
Path : root ,
RelPath : filepath . Base ( root ),
MaxFileSize : s . maxFileSize ,
}}
return s . ScanTargets ( ctx , targets )
}
// Directory scan: discover all files recursively.
discovery := & TargetDiscovery { IgnorePatterns : s . ignorePatterns , MaxFileSize : s . maxFileSize }
targets , err := discovery . Discover ( root )
if err != nil {
return nil , err
}
return s . ScanTargets ( ctx , targets )
}
The discovery phase:
Walks directories recursively
Filters out binary files, .git/, node_modules/, etc.
Respects .aguaraignore patterns
Enforces max file size limits (default: 50 MB)
Creates a Target for each scannable file
2. Parallel Analysis
Once files are discovered, Aguara distributes them across worker goroutines for parallel processing:
internal/scanner/scanner.go:108-163
fileCh := make ( chan * Target , len ( targets ))
for _ , t := range targets {
fileCh <- t
}
close ( fileCh )
total := len ( targets )
for range s . workers {
wg . Go ( func () {
for target := range fileCh {
if ctx . Err () != nil {
return
}
if err := target . LoadContent (); err != nil {
continue
}
ignoreIndex := buildIgnoreIndex ( parseIgnoreDirectives ( target . Content ))
for _ , analyzer := range s . analyzers {
if ctx . Err () != nil {
return
}
results , err := analyzer . Analyze ( ctx , target )
if err != nil {
continue
}
// ... apply inline ignore directives ...
}
}
})
}
Worker count : Defaults to runtime.NumCPU(), configurable via --workers flag.
Analyzers run in sequence on each file:
Pattern Matcher
NLP Analyzer
Taint Tracker
Rug-Pull Detector (if --monitor is enabled)
Each analyzer implements this interface:
internal/scanner/analyzer.go
type Analyzer interface {
Name () string
Analyze ( ctx context . Context , target * Target ) ([] Finding , error )
}
3. Post-Processing
After all files are scanned, Aguara post-processes the findings to improve signal-to-noise ratio:
internal/scanner/scanner.go:180-209
func ( s * Scanner ) postProcess ( findings [] Finding ) [] Finding {
findings = meta . Deduplicate ( findings )
findings = meta . ScoreFindings ( findings )
groups := meta . Correlate ( findings )
findings = flattenGroups ( groups )
findings = meta . AdjustConfidence ( findings )
if s . minSeverity > SeverityInfo {
var filtered [] Finding
for _ , f := range findings {
if f . Severity >= s . minSeverity {
filtered = append ( filtered , f )
}
}
findings = filtered
}
sort . Slice ( findings , func ( i , j int ) bool {
if findings [ i ]. Severity != findings [ j ]. Severity {
return findings [ i ]. Severity > findings [ j ]. Severity
}
if findings [ i ]. FilePath != findings [ j ]. FilePath {
return findings [ i ]. FilePath < findings [ j ]. FilePath
}
return findings [ i ]. Line < findings [ j ]. Line
})
return findings
}
Post-processing steps:
Deduplication : Removes identical findings (same file, line, rule)
Scoring : Calculates 0-100 risk scores based on severity + category weights
Correlation : Groups findings within 5 lines, applies bonuses
Confidence Adjustment : Downgrades code block findings, boosts correlated ones
Filtering : Applies --severity threshold
Sorting : By severity DESC → file path ASC → line ASC
Rule Compilation
Rules are loaded once at startup and compiled into executable patterns:
func loadAndCompile ( cfg * scanConfig ) ([] * rules . CompiledRule , error ) {
rawRules , err := rules . LoadFromFS ( builtin . FS ())
if err != nil {
return nil , fmt . Errorf ( "loading built-in rules: %w " , err )
}
if cfg . customRulesDir != "" {
custom , err := rules . LoadFromDir ( cfg . customRulesDir )
if err != nil {
return nil , fmt . Errorf ( "loading custom rules from %s : %w " , cfg . customRulesDir , err )
}
rawRules = append ( rawRules , custom ... )
}
compiled , compileErrs := rules . CompileAll ( rawRules )
for _ , e := range compileErrs {
fmt . Fprintf ( os . Stderr , "aguara: warning: %v \n " , e )
}
if len ( cfg . ruleOverrides ) > 0 {
overrides := make ( map [ string ] rules . RuleOverride , len ( cfg . ruleOverrides ))
for id , ovr := range cfg . ruleOverrides {
overrides [ id ] = rules . RuleOverride { Severity : ovr . Severity , Disabled : ovr . Disabled }
}
var overrideErrs [] error
compiled , overrideErrs = rules . ApplyOverrides ( compiled , overrides )
for _ , e := range overrideErrs {
fmt . Fprintf ( os . Stderr , "aguara: warning: %v \n " , e )
}
}
if len ( cfg . disabledRules ) > 0 {
disabled := make ( map [ string ] bool , len ( cfg . disabledRules ))
for _ , id := range cfg . disabledRules {
disabled [ strings . TrimSpace ( id )] = true
}
compiled = rules . FilterByIDs ( compiled , disabled )
}
return compiled , nil
}
Compilation steps :
Load built-in rules from embedded YAML files (go:embed)
Load custom rules from --rules directory (if provided)
Compile regex patterns using Go’s regexp (RE2 syntax)
Apply .aguara.yml overrides (severity changes, disabled rules)
Filter by --disable-rule flags
Regex compilation errors are logged to stderr but do not fail the scan. The rule is skipped.
Scanner Architecture
Aguara’s codebase is organized into clear layers:
aguara.go Public API: Scan, ScanContent, Discover, ListRules
options.go Functional options (WithMinSeverity, WithCustomRules, etc.)
discover/ MCP client auto-detection (17 clients)
cmd/aguara/ CLI entry point (Cobra commands)
internal/
engine/ 4 analysis layers
pattern/ Layer 1: Pattern Matcher + decoder
nlp/ Layer 2: NLP Analyzer (Goldmark AST walker)
toxicflow/ Layer 3: Taint Tracker
rugpull/ Layer 4: Rug-Pull Detector
rules/ Rule engine (YAML loader, compiler, self-tester)
builtin/ 177 embedded rules (go:embed)
scanner/ Orchestrator (file discovery, parallel execution, inline ignore)
meta/ Post-processing (dedup, scoring, correlation, confidence)
output/ Formatters (terminal, JSON, SARIF, Markdown)
config/ .aguara.yml loader
state/ Persistence for incremental scans + rug-pull detection
types/ Shared types (Finding, Severity, ScanResult)
Analyzer Registration
Analyzers are registered in buildScanner:
func buildScanner ( cfg * scanConfig ) ( * scanner . Scanner , [] * rules . CompiledRule , error ) {
compiled , err := loadAndCompile ( cfg )
if err != nil {
return nil , nil , err
}
s := scanner . New ( cfg . workers )
s . SetMinSeverity ( cfg . minSeverity )
if len ( cfg . ignorePatterns ) > 0 {
s . SetIgnorePatterns ( cfg . ignorePatterns )
}
if cfg . maxFileSize > 0 {
s . SetMaxFileSize ( cfg . maxFileSize )
}
s . RegisterAnalyzer ( pattern . NewMatcher ( compiled ))
s . RegisterAnalyzer ( nlp . NewInjectionAnalyzer ())
s . RegisterAnalyzer ( toxicflow . New ())
return s , compiled , nil
}
The rug-pull analyzer is conditionally registered only when --monitor is enabled.
Parallel file processing : Scans scale across CPU cores
Single-pass analysis : Each analyzer reads the file once
Zero allocations for ignore checks : Code block maps use []bool slices
Regex caching : Patterns are compiled once at startup
Streaming output : JSON/SARIF emitted as findings are produced
Benchmark (2,000 files, 10 MB total, M1 MacBook Pro):
Pattern Matcher : ~120ms
NLP Analyzer : ~45ms
Taint Tracker : ~15ms
Rug-Pull Detector : ~5ms
Total scan time : ~200ms
Use --workers N to control parallelism. Higher values don’t always improve performance due to I/O bottlenecks.
Related Pages
Detection Layers Deep dive into the 4 analysis engines
Confidence Scoring How risk scores and confidence levels are calculated