How It Works

Overview

Aguara uses a 4-layer analysis pipeline that runs sequentially on every scanned file. Each layer catches different attack patterns, and their findings are combined through post-processing to produce a comprehensive security report.

File Discovery → Parallel Analysis → Post-Processing → Report
       ↓                ↓                   ↓             ↓
   .md, .json      4 Analyzers         Scoring &    Terminal/JSON/
   .txt, .py       in sequence         Correlation    SARIF/MD

Scanning Workflow

1. File Discovery

Aguara starts by discovering all scannable files in the target directory:

func (s *Scanner) Scan(ctx context.Context, root string) (*ScanResult, error) {
    // Check if root is a single file (not a directory).
    info, err := os.Stat(root)
    if err != nil {
        return nil, err
    }
    if !info.IsDir() {
        // Single-file scan: use the filename as RelPath so target-filtered
        // rules (e.g. "*.md", "*.json") can match correctly.
        targets := []*Target{{
            Path:        root,
            RelPath:     filepath.Base(root),
            MaxFileSize: s.maxFileSize,
        }}
        return s.ScanTargets(ctx, targets)
    }

    // Directory scan: discover all files recursively.
    discovery := &TargetDiscovery{IgnorePatterns: s.ignorePatterns, MaxFileSize: s.maxFileSize}
    targets, err := discovery.Discover(root)
    if err != nil {
        return nil, err
    }

    return s.ScanTargets(ctx, targets)
}

The discovery phase:

Walks directories recursively
Filters out binary files, .git/, node_modules/, etc.
Respects .aguaraignore patterns
Enforces max file size limits (default: 50 MB)
Creates a Target for each scannable file

2. Parallel Analysis

Once files are discovered, Aguara distributes them across worker goroutines for parallel processing:

internal/scanner/scanner.go:108-163

fileCh := make(chan *Target, len(targets))
for _, t := range targets {
    fileCh <- t
}
close(fileCh)

total := len(targets)
for range s.workers {
    wg.Go(func() {
        for target := range fileCh {
            if ctx.Err() != nil {
                return
            }
            if err := target.LoadContent(); err != nil {
                continue
            }
            ignoreIndex := buildIgnoreIndex(parseIgnoreDirectives(target.Content))
            for _, analyzer := range s.analyzers {
                if ctx.Err() != nil {
                    return
                }
                results, err := analyzer.Analyze(ctx, target)
                if err != nil {
                    continue
                }
                // ... apply inline ignore directives ...
            }
        }
    })
}

Worker count: Defaults to runtime.NumCPU(), configurable via --workers flag. Analyzers run in sequence on each file:

Pattern Matcher
NLP Analyzer
Taint Tracker
Rug-Pull Detector (if --monitor is enabled)

Each analyzer implements this interface:

internal/scanner/analyzer.go

type Analyzer interface {
    Name() string
    Analyze(ctx context.Context, target *Target) ([]Finding, error)
}

3. Post-Processing

After all files are scanned, Aguara post-processes the findings to improve signal-to-noise ratio:

internal/scanner/scanner.go:180-209

func (s *Scanner) postProcess(findings []Finding) []Finding {
    findings = meta.Deduplicate(findings)
    findings = meta.ScoreFindings(findings)
    groups := meta.Correlate(findings)
    findings = flattenGroups(groups)
    findings = meta.AdjustConfidence(findings)

    if s.minSeverity > SeverityInfo {
        var filtered []Finding
        for _, f := range findings {
            if f.Severity >= s.minSeverity {
                filtered = append(filtered, f)
            }
        }
        findings = filtered
    }

    sort.Slice(findings, func(i, j int) bool {
        if findings[i].Severity != findings[j].Severity {
            return findings[i].Severity > findings[j].Severity
        }
        if findings[i].FilePath != findings[j].FilePath {
            return findings[i].FilePath < findings[j].FilePath
        }
        return findings[i].Line < findings[j].Line
    })

    return findings
}

Post-processing steps:

Deduplication: Removes identical findings (same file, line, rule)
Scoring: Calculates 0-100 risk scores based on severity + category weights
Correlation: Groups findings within 5 lines, applies bonuses
Confidence Adjustment: Downgrades code block findings, boosts correlated ones
Filtering: Applies --severity threshold
Sorting: By severity DESC → file path ASC → line ASC

Rule Compilation

Rules are loaded once at startup and compiled into executable patterns:

aguara.go:200-242

func loadAndCompile(cfg *scanConfig) ([]*rules.CompiledRule, error) {
    rawRules, err := rules.LoadFromFS(builtin.FS())
    if err != nil {
        return nil, fmt.Errorf("loading built-in rules: %w", err)
    }

    if cfg.customRulesDir != "" {
        custom, err := rules.LoadFromDir(cfg.customRulesDir)
        if err != nil {
            return nil, fmt.Errorf("loading custom rules from %s: %w", cfg.customRulesDir, err)
        }
        rawRules = append(rawRules, custom...)
    }

    compiled, compileErrs := rules.CompileAll(rawRules)
    for _, e := range compileErrs {
        fmt.Fprintf(os.Stderr, "aguara: warning: %v\n", e)
    }

    if len(cfg.ruleOverrides) > 0 {
        overrides := make(map[string]rules.RuleOverride, len(cfg.ruleOverrides))
        for id, ovr := range cfg.ruleOverrides {
            overrides[id] = rules.RuleOverride{Severity: ovr.Severity, Disabled: ovr.Disabled}
        }
        var overrideErrs []error
        compiled, overrideErrs = rules.ApplyOverrides(compiled, overrides)
        for _, e := range overrideErrs {
            fmt.Fprintf(os.Stderr, "aguara: warning: %v\n", e)
        }
    }

    if len(cfg.disabledRules) > 0 {
        disabled := make(map[string]bool, len(cfg.disabledRules))
        for _, id := range cfg.disabledRules {
            disabled[strings.TrimSpace(id)] = true
        }
        compiled = rules.FilterByIDs(compiled, disabled)
    }

    return compiled, nil
}

Compilation steps:

Load built-in rules from embedded YAML files (go:embed)
Load custom rules from --rules directory (if provided)
Compile regex patterns using Go’s regexp (RE2 syntax)
Apply .aguara.yml overrides (severity changes, disabled rules)
Filter by --disable-rule flags

Regex compilation errors are logged to stderr but do not fail the scan. The rule is skipped.

Scanner Architecture

Aguara’s codebase is organized into clear layers:

aguara.go              Public API: Scan, ScanContent, Discover, ListRules
options.go             Functional options (WithMinSeverity, WithCustomRules, etc.)
discover/              MCP client auto-detection (17 clients)
cmd/aguara/            CLI entry point (Cobra commands)
internal/
  engine/              4 analysis layers
    pattern/           Layer 1: Pattern Matcher + decoder
    nlp/               Layer 2: NLP Analyzer (Goldmark AST walker)
    toxicflow/         Layer 3: Taint Tracker
    rugpull/           Layer 4: Rug-Pull Detector
  rules/               Rule engine (YAML loader, compiler, self-tester)
    builtin/           177 embedded rules (go:embed)
  scanner/             Orchestrator (file discovery, parallel execution, inline ignore)
  meta/                Post-processing (dedup, scoring, correlation, confidence)
  output/              Formatters (terminal, JSON, SARIF, Markdown)
  config/              .aguara.yml loader
  state/               Persistence for incremental scans + rug-pull detection
  types/               Shared types (Finding, Severity, ScanResult)

Analyzer Registration

Analyzers are registered in buildScanner:

aguara.go:244-265

func buildScanner(cfg *scanConfig) (*scanner.Scanner, []*rules.CompiledRule, error) {
    compiled, err := loadAndCompile(cfg)
    if err != nil {
        return nil, nil, err
    }

    s := scanner.New(cfg.workers)
    s.SetMinSeverity(cfg.minSeverity)
    if len(cfg.ignorePatterns) > 0 {
        s.SetIgnorePatterns(cfg.ignorePatterns)
    }
    if cfg.maxFileSize > 0 {
        s.SetMaxFileSize(cfg.maxFileSize)
    }

    s.RegisterAnalyzer(pattern.NewMatcher(compiled))
    s.RegisterAnalyzer(nlp.NewInjectionAnalyzer())
    s.RegisterAnalyzer(toxicflow.New())

    return s, compiled, nil
}

The rug-pull analyzer is conditionally registered only when --monitor is enabled.

Performance Characteristics

Parallel file processing: Scans scale across CPU cores
Single-pass analysis: Each analyzer reads the file once
Zero allocations for ignore checks: Code block maps use []bool slices
Regex caching: Patterns are compiled once at startup
Streaming output: JSON/SARIF emitted as findings are produced

Benchmark (2,000 files, 10 MB total, M1 MacBook Pro):

Pattern Matcher: ~120ms
NLP Analyzer: ~45ms
Taint Tracker: ~15ms
Rug-Pull Detector: ~5ms
Total scan time: ~200ms

Use --workers N to control parallelism. Higher values don’t always improve performance due to I/O bottlenecks.

Detection Layers

Deep dive into the 4 analysis engines

Confidence Scoring

How risk scores and confidence levels are calculated

Get Started

Core Concepts

CLI Usage

Configuration

CI/CD Integration

Rules & Detection

Advanced Features

Overview

Scanning Workflow

1. File Discovery

2. Parallel Analysis

3. Post-Processing

Rule Compilation

Scanner Architecture

Analyzer Registration

Performance Characteristics

Detection Layers

Confidence Scoring

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Usage

Configuration

CI/CD Integration

Rules & Detection

Advanced Features

​Overview

​Scanning Workflow

​1. File Discovery

​2. Parallel Analysis

​3. Post-Processing

​Rule Compilation

​Scanner Architecture

​Analyzer Registration

​Performance Characteristics

​Related Pages

Detection Layers

Confidence Scoring

Build docs developers (and LLMs) love

Overview

Scanning Workflow

1. File Discovery

2. Parallel Analysis

3. Post-Processing

Rule Compilation

Scanner Architecture

Analyzer Registration

Performance Characteristics

Related Pages