Skip to main content
Clanker’s agent system transforms natural language questions into coordinated AWS telemetry investigations. The architecture employs semantic analysis, decision trees, and dependency-aware parallel execution to gather cloud infrastructure data efficiently.

System overview

The agent package (internal/agent/) orchestrates intelligent context gathering through several specialized subsystems:
agent/
├── agent.go          # High-level orchestrator and public entry points
├── coordinator/      # Dependency-aware parallel execution driver
├── decisiontree/     # Intent rules that map queries to agent types
├── memory/           # Rolling knowledge of previous investigations
├── model/            # Shared structs and type aliases
└── semantic/         # Lightweight NLP classifier for intents

Investigation flow

When you run clanker ask "what lambda functions are failing?", the agent follows this execution path:
1

Semantic analysis

The semantic.Analyzer performs keyword-based intent classification without external NLP calls. It extracts:
  • Primary intent (troubleshoot, monitor, analyze)
  • Urgency level (critical, high, medium, low)
  • Target services (lambda, ecs, s3, etc.)
  • Time frame (recent, last_hour, last_day)
  • Data types (logs, metrics, status)
See internal/agent/semantic/analyzer.go:27
2

Decision tree traversal

The decision tree maps semantic intent to concrete agent types and execution parameters:
type Node struct {
    ID         string
    Name       string
    Condition  string        // e.g., "contains_keywords(['error', 'fail'])"
    Priority   int
    AgentTypes []string      // e.g., ["log", "metrics"]
    Parameters model.AWSData
}
Nodes are evaluated depth-first. Matching conditions spawn their configured agent types.See internal/agent/decisiontree/tree.go:31
3

Dependency scheduling

The DependencyScheduler groups agents by execution order:
  • Order 1: Independent collectors (log, metrics, k8s)
  • Order 2: Infrastructure agents requiring basic data (infrastructure, deployment)
  • Order 3: Analysis agents requiring enriched data (security, queue)
  • Order 4+: Higher-order insights (cost, availability)
Each agent declares required and provided data:
AgentTypeLog = AgentType{
    Name: "log",
    Dependencies: Dependency{
        ProvidedData:   []string{"logs", "error_patterns", "log_metrics"},
        ExecutionOrder: 1,
    },
}

AgentTypeSecurity = AgentType{
    Name: "security",
    Dependencies: Dependency{
        RequiredData:   []string{"logs", "service_config"},
        ProvidedData:   []string{"security_status", "access_patterns"},
        ExecutionOrder: 3,
    },
}
See internal/agent/coordinator/agent_types.go:5
4

Parallel execution

Within each order group, agents run concurrently:
for _, group := range planned {
    var wg sync.WaitGroup
    for _, cfg := range group.Agents {
        if scheduler.Ready(cfg.AgentType, dataBus) {
            agent := newParallelAgent(cfg)
            registry.Register(agent)
            wg.Add(1)
            go runPlannedAgent(ctx, &wg, agent)
        }
    }
    wg.Wait()
}
Each agent:
  1. Copies the main context
  2. Executes AWS operations (CLI calls or SDK methods)
  3. Stores results in its local Results map
  4. Publishes promised data to the SharedDataBus
See internal/agent/coordinator/coordinator.go:63
5

Result aggregation

The coordinator merges all successful agent outputs:
aggregated := make(model.AWSData)
for _, agent := range registry.Agents() {
    if agent.Status != "completed" {
        continue
    }
    agentKey := agent.Type.Name
    aggregated[agentKey] = agent.Results
    for key, value := range agent.Results {
        aggregated[fmt.Sprintf("%s_%s", agentKey, key)] = value
    }
}
Metadata includes execution counts, timestamps, and decision path.See internal/agent/coordinator/coordinator.go:138
6

Context building

The final context string merges:
  • Semantic analysis summary
  • All parallel agent results (grouped by agent type)
  • Service-specific log analysis
  • Error patterns and metrics
  • Agent reasoning chain (chain of thought)
This structured context is fed to the LLM for final answer generation.See internal/agent/agent.go:306

Core components

Agent orchestrator

The Agent type in agent.go wires everything together:
type Agent struct {
    client       *awsclient.Client
    debug        bool
    maxSteps     int
    aiDecisionFn func(context.Context, string) (string, error)
}

func (a *Agent) InvestigateQuery(ctx context.Context, query string) (*AgentContext, error)
Key responsibilities:
  • Run semantic analysis
  • Traverse decision tree via coordinator
  • Spawn parallel agents or fall back to sequential planner
  • Build final context for LLM
See internal/agent/agent.go:42

Coordinator

The Coordinator drives dependency-tree-based parallel execution:
type Coordinator struct {
    DecisionTree *dt.Tree
    MainContext  *model.AgentContext
    client       *awsclient.Client
    registry     *AgentRegistry
    dataBus      *SharedDataBus
    scheduler    *DependencyScheduler
}
Public methods:
  • Analyze(query string) — traverse decision tree
  • SpawnAgents(ctx, applicable) — launch agents by dependency order
  • WaitForCompletion(ctx, timeout) — block until all agents finish
  • AggregateResults() — merge successful outputs
  • Stats() — execution metrics
See internal/agent/coordinator/coordinator.go:34

Shared data bus

The SharedDataBus stores dependency data produced by agents:
type SharedDataBus struct {
    mu   sync.RWMutex
    data map[string]any
}

func (b *SharedDataBus) Store(key string, value any)
func (b *SharedDataBus) Load(key string) (any, bool)
func (b *SharedDataBus) HasAll(keys []string) bool
Agents publish data using keys from ProvidedData. Downstream agents check RequiredData before executing. See internal/agent/coordinator/state.go:10

Agent registry

The AgentRegistry tracks running agents and maintains counters:
type AgentStats struct {
    Total     int
    Completed int
    Failed    int
}

type AgentRegistry struct {
    mu     sync.RWMutex
    agents []*ParallelAgent
    stats  AgentStats
}
Thread-safe methods:
  • Register(agent) — add agent and increment total
  • MarkCompleted() / MarkFailed() — update counters
  • Agents() — snapshot of all agents
  • Stats() — execution summary
See internal/agent/coordinator/state.go:58

Agent types

Clanker includes these built-in specialist agents:
Execution order: 1 (independent)Provides: logs, error_patterns, log_metricsOperations:
  • Discover relevant log groups
  • Sample recent log entries
  • Filter error patterns
  • Extract log stream metadata
See internal/agent/coordinator/agent_types.go:28
Execution order: 1 (independent)Provides: metrics, performance_data, thresholdsOperations:
  • Query CloudWatch metrics
  • Check alarm states
  • Aggregate performance data
See internal/agent/coordinator/agent_types.go:36
Execution order: 2Provides: service_config, deployment_status, resource_healthOperations:
  • List EC2, ECS, Lambda resources
  • Describe service configurations
  • Check deployment status
See internal/agent/coordinator/agent_types.go:44
Execution order: 3Requires: logs, service_configProvides: security_status, access_patterns, vulnerabilitiesOperations:
  • Analyze IAM policies
  • Check security group rules
  • Audit access logs
See internal/agent/coordinator/agent_types.go:52
Execution order: 4Requires: metrics, resource_healthProvides: cost_analysis, usage_patterns, optimization_suggestionsOperations:
  • Query Cost Explorer
  • Analyze resource utilization
  • Generate optimization recommendations
See internal/agent/coordinator/agent_types.go:61
Execution order: 1 (independent)Provides: k8s_resources, k8s_healthOperations:
  • List pods, deployments, services
  • Check resource status
  • Gather cluster metrics
See internal/agent/coordinator/agent_types.go:20

Sequential fallback

When the decision tree returns no applicable nodes, the agent falls back to a traditional sequential approach:
if len(applicableNodes) == 0 {
    a.runSequentialPlanner(ctx, agentCtx)
}
The sequential planner:
  1. Calls an LLM decision function to determine the next action
  2. Executes the chosen action (gather logs, metrics, etc.)
  3. Repeats until complete or maxSteps is reached
This provides a safety net for queries that don’t match decision tree patterns. See internal/agent/planner.go:13

Extending the system

1

Add a new agent type

Define the agent in internal/agent/coordinator/agent_types.go:
AgentTypeMonitoring = AgentType{
    Name: "monitoring",
    Dependencies: Dependency{
        RequiredData:   []string{"metrics"},
        ProvidedData:   []string{"alerts", "dashboards"},
        ExecutionOrder: 3,
    },
}
2

Register operations

Add operations in internal/agent/coordinator/operations.go:
func (c *Coordinator) getOperationsForAgentType(agt AgentType) []awsclient.LLMOperation {
    switch agt.Name {
    case "monitoring":
        return []awsclient.LLMOperation{
            {Operation: "list_cloudwatch_alarms", Parameters: map[string]any{}},
        }
    }
}
3

Update decision tree

Add decision tree nodes that spawn your agent:
&Node{
    ID:         "monitoring-check",
    Name:       "Monitoring Analysis",
    Condition:  "contains_keywords(['alert', 'alarm', 'dashboard'])",
    Priority:   8,
    AgentTypes: []string{"monitoring"},
}
Keep shared structs in internal/agent/model/ to avoid circular imports. Run gofmt after edits and ensure go build ./... stays green.

Performance considerations

Parallelism

Agents in the same execution order run concurrently, reducing total investigation time. Use --agent-trace to see lifecycle logs.

Timeouts

Each agent type has a WaitTimeout (typically 5-8 seconds). The coordinator waits up to 15 seconds for all agents to complete.

Dependency checks

Agents only execute when their dependencies are satisfied on the data bus. This prevents wasted work and ensures data consistency.

Graceful degradation

Failed agents don’t block the pipeline. The coordinator aggregates whatever data is available and proceeds with partial results.

Debugging agent execution

Enable detailed agent tracing:
clanker ask "show lambda errors" --agent-trace
This outputs:
  • Decision tree matches and priorities
  • Execution order groups
  • Agent start/completion events
  • Dependency satisfaction checks
  • Final aggregation stats
Alternatively, set in config:
.clanker.yaml
agent:
  trace: true
See internal/agent/coordinator/coordinator.go:17 for the trace flag check.

Debugging

Debug flags and trace output

Backend API

Credential storage and multi-machine sync

Custom profiles

AI provider configuration

Ask command

Natural language queries

Build docs developers (and LLMs) love