Skip to main content

Overview

Aguara uses a 4-layer analysis pipeline that runs sequentially on every scanned file. Each layer catches different attack patterns, and their findings are combined through post-processing to produce a comprehensive security report.
File Discovery → Parallel Analysis → Post-Processing → Report
       ↓                ↓                   ↓             ↓
   .md, .json      4 Analyzers         Scoring &    Terminal/JSON/
   .txt, .py       in sequence         Correlation    SARIF/MD

Scanning Workflow

1. File Discovery

Aguara starts by discovering all scannable files in the target directory:
func (s *Scanner) Scan(ctx context.Context, root string) (*ScanResult, error) {
    // Check if root is a single file (not a directory).
    info, err := os.Stat(root)
    if err != nil {
        return nil, err
    }
    if !info.IsDir() {
        // Single-file scan: use the filename as RelPath so target-filtered
        // rules (e.g. "*.md", "*.json") can match correctly.
        targets := []*Target{{
            Path:        root,
            RelPath:     filepath.Base(root),
            MaxFileSize: s.maxFileSize,
        }}
        return s.ScanTargets(ctx, targets)
    }

    // Directory scan: discover all files recursively.
    discovery := &TargetDiscovery{IgnorePatterns: s.ignorePatterns, MaxFileSize: s.maxFileSize}
    targets, err := discovery.Discover(root)
    if err != nil {
        return nil, err
    }

    return s.ScanTargets(ctx, targets)
}
The discovery phase:
  • Walks directories recursively
  • Filters out binary files, .git/, node_modules/, etc.
  • Respects .aguaraignore patterns
  • Enforces max file size limits (default: 50 MB)
  • Creates a Target for each scannable file

2. Parallel Analysis

Once files are discovered, Aguara distributes them across worker goroutines for parallel processing:
internal/scanner/scanner.go:108-163
fileCh := make(chan *Target, len(targets))
for _, t := range targets {
    fileCh <- t
}
close(fileCh)

total := len(targets)
for range s.workers {
    wg.Go(func() {
        for target := range fileCh {
            if ctx.Err() != nil {
                return
            }
            if err := target.LoadContent(); err != nil {
                continue
            }
            ignoreIndex := buildIgnoreIndex(parseIgnoreDirectives(target.Content))
            for _, analyzer := range s.analyzers {
                if ctx.Err() != nil {
                    return
                }
                results, err := analyzer.Analyze(ctx, target)
                if err != nil {
                    continue
                }
                // ... apply inline ignore directives ...
            }
        }
    })
}
Worker count: Defaults to runtime.NumCPU(), configurable via --workers flag. Analyzers run in sequence on each file:
  1. Pattern Matcher
  2. NLP Analyzer
  3. Taint Tracker
  4. Rug-Pull Detector (if --monitor is enabled)
Each analyzer implements this interface:
internal/scanner/analyzer.go
type Analyzer interface {
    Name() string
    Analyze(ctx context.Context, target *Target) ([]Finding, error)
}

3. Post-Processing

After all files are scanned, Aguara post-processes the findings to improve signal-to-noise ratio:
internal/scanner/scanner.go:180-209
func (s *Scanner) postProcess(findings []Finding) []Finding {
    findings = meta.Deduplicate(findings)
    findings = meta.ScoreFindings(findings)
    groups := meta.Correlate(findings)
    findings = flattenGroups(groups)
    findings = meta.AdjustConfidence(findings)

    if s.minSeverity > SeverityInfo {
        var filtered []Finding
        for _, f := range findings {
            if f.Severity >= s.minSeverity {
                filtered = append(filtered, f)
            }
        }
        findings = filtered
    }

    sort.Slice(findings, func(i, j int) bool {
        if findings[i].Severity != findings[j].Severity {
            return findings[i].Severity > findings[j].Severity
        }
        if findings[i].FilePath != findings[j].FilePath {
            return findings[i].FilePath < findings[j].FilePath
        }
        return findings[i].Line < findings[j].Line
    })

    return findings
}
Post-processing steps:
  1. Deduplication: Removes identical findings (same file, line, rule)
  2. Scoring: Calculates 0-100 risk scores based on severity + category weights
  3. Correlation: Groups findings within 5 lines, applies bonuses
  4. Confidence Adjustment: Downgrades code block findings, boosts correlated ones
  5. Filtering: Applies --severity threshold
  6. Sorting: By severity DESC → file path ASC → line ASC

Rule Compilation

Rules are loaded once at startup and compiled into executable patterns:
aguara.go:200-242
func loadAndCompile(cfg *scanConfig) ([]*rules.CompiledRule, error) {
    rawRules, err := rules.LoadFromFS(builtin.FS())
    if err != nil {
        return nil, fmt.Errorf("loading built-in rules: %w", err)
    }

    if cfg.customRulesDir != "" {
        custom, err := rules.LoadFromDir(cfg.customRulesDir)
        if err != nil {
            return nil, fmt.Errorf("loading custom rules from %s: %w", cfg.customRulesDir, err)
        }
        rawRules = append(rawRules, custom...)
    }

    compiled, compileErrs := rules.CompileAll(rawRules)
    for _, e := range compileErrs {
        fmt.Fprintf(os.Stderr, "aguara: warning: %v\n", e)
    }

    if len(cfg.ruleOverrides) > 0 {
        overrides := make(map[string]rules.RuleOverride, len(cfg.ruleOverrides))
        for id, ovr := range cfg.ruleOverrides {
            overrides[id] = rules.RuleOverride{Severity: ovr.Severity, Disabled: ovr.Disabled}
        }
        var overrideErrs []error
        compiled, overrideErrs = rules.ApplyOverrides(compiled, overrides)
        for _, e := range overrideErrs {
            fmt.Fprintf(os.Stderr, "aguara: warning: %v\n", e)
        }
    }

    if len(cfg.disabledRules) > 0 {
        disabled := make(map[string]bool, len(cfg.disabledRules))
        for _, id := range cfg.disabledRules {
            disabled[strings.TrimSpace(id)] = true
        }
        compiled = rules.FilterByIDs(compiled, disabled)
    }

    return compiled, nil
}
Compilation steps:
  1. Load built-in rules from embedded YAML files (go:embed)
  2. Load custom rules from --rules directory (if provided)
  3. Compile regex patterns using Go’s regexp (RE2 syntax)
  4. Apply .aguara.yml overrides (severity changes, disabled rules)
  5. Filter by --disable-rule flags
Regex compilation errors are logged to stderr but do not fail the scan. The rule is skipped.

Scanner Architecture

Aguara’s codebase is organized into clear layers:
aguara.go              Public API: Scan, ScanContent, Discover, ListRules
options.go             Functional options (WithMinSeverity, WithCustomRules, etc.)
discover/              MCP client auto-detection (17 clients)
cmd/aguara/            CLI entry point (Cobra commands)
internal/
  engine/              4 analysis layers
    pattern/           Layer 1: Pattern Matcher + decoder
    nlp/               Layer 2: NLP Analyzer (Goldmark AST walker)
    toxicflow/         Layer 3: Taint Tracker
    rugpull/           Layer 4: Rug-Pull Detector
  rules/               Rule engine (YAML loader, compiler, self-tester)
    builtin/           177 embedded rules (go:embed)
  scanner/             Orchestrator (file discovery, parallel execution, inline ignore)
  meta/                Post-processing (dedup, scoring, correlation, confidence)
  output/              Formatters (terminal, JSON, SARIF, Markdown)
  config/              .aguara.yml loader
  state/               Persistence for incremental scans + rug-pull detection
  types/               Shared types (Finding, Severity, ScanResult)

Analyzer Registration

Analyzers are registered in buildScanner:
aguara.go:244-265
func buildScanner(cfg *scanConfig) (*scanner.Scanner, []*rules.CompiledRule, error) {
    compiled, err := loadAndCompile(cfg)
    if err != nil {
        return nil, nil, err
    }

    s := scanner.New(cfg.workers)
    s.SetMinSeverity(cfg.minSeverity)
    if len(cfg.ignorePatterns) > 0 {
        s.SetIgnorePatterns(cfg.ignorePatterns)
    }
    if cfg.maxFileSize > 0 {
        s.SetMaxFileSize(cfg.maxFileSize)
    }

    s.RegisterAnalyzer(pattern.NewMatcher(compiled))
    s.RegisterAnalyzer(nlp.NewInjectionAnalyzer())
    s.RegisterAnalyzer(toxicflow.New())

    return s, compiled, nil
}
The rug-pull analyzer is conditionally registered only when --monitor is enabled.

Performance Characteristics

  • Parallel file processing: Scans scale across CPU cores
  • Single-pass analysis: Each analyzer reads the file once
  • Zero allocations for ignore checks: Code block maps use []bool slices
  • Regex caching: Patterns are compiled once at startup
  • Streaming output: JSON/SARIF emitted as findings are produced
Benchmark (2,000 files, 10 MB total, M1 MacBook Pro):
  • Pattern Matcher: ~120ms
  • NLP Analyzer: ~45ms
  • Taint Tracker: ~15ms
  • Rug-Pull Detector: ~5ms
  • Total scan time: ~200ms
Use --workers N to control parallelism. Higher values don’t always improve performance due to I/O bottlenecks.

Detection Layers

Deep dive into the 4 analysis engines

Confidence Scoring

How risk scores and confidence levels are calculated

Build docs developers (and LLMs) love