AI agents

Clanker uses a sophisticated multi-agent architecture to investigate your cloud infrastructure intelligently. Rather than making sequential API calls, the system spawns specialized agents in parallel to gather context efficiently.

Architecture overview

The agent system consists of three core components:

Coordinator

Orchestrates parallel agent execution using decision trees

Semantic analyzer

Classifies user intent without external API calls

Specialized agents

Domain-specific workers (logs, metrics, K8s, security, etc.)

How it works

When you ask a question, Clanker follows this workflow:

1. Semantic analysis

The semantic analyzer performs lightweight NLP to classify your query without calling external services. It scores intent by summing keyword weights and infers urgency, timeframe, and target services.

func (sa *Analyzer) AnalyzeQuery(query string) model.QueryIntent {
    queryLower := strings.ToLower(query)
    words := strings.Fields(queryLower)

    intent := model.QueryIntent{
        Confidence:     0.0,
        TargetServices: []string{},
        Urgency:        "medium",
        TimeFrame:      "recent",
        DataTypes:      []string{},
    }

    // Score intent by summing word weights
    intentScores := make(map[string]float64)
    for intentType, signals := range sa.IntentSignals {
        score := 0.0
        for _, word := range words {
            if weight, exists := signals[word]; exists {
                score += weight
            }
        }
        intentScores[intentType] = score
    }

    // Select highest-scoring intent
    maxScore := 0.0
    for intentType, score := range intentScores {
        if score > maxScore {
            maxScore = score
            intent.Primary = intentType
        }
    }

    // Calculate confidence based on query length
    if len(words) > 0 {
        intent.Confidence = math.Min(maxScore/float64(len(words)), 1.0)
    }

    // Identify target services
    for service, keywords := range sa.ServiceMapping {
        for _, keyword := range keywords {
            if strings.Contains(queryLower, keyword) {
                intent.TargetServices = append(intent.TargetServices, service)
                break
            }
        }
    }

Fast and offline: Semantic analysis runs entirely locally using keyword matching and weighted scoring. No LLM calls are needed for this step.

2. Decision tree traversal

The coordinator uses the semantic analysis to traverse a decision tree that maps user intent to agent execution strategies.

func (t *Tree) Traverse(query string, ctx *model.AgentContext) []*Node {
    var applicable []*Node
    t.traverseNode(t.Root, query, ctx, &applicable)
    return applicable
}

func (t *Tree) traverseNode(node *Node, query string, ctx *model.AgentContext, applicable *[]*Node) {
    if t.evaluateCondition(node.Condition, query) {
        *applicable = append(*applicable, node)
        t.CurrentPath = append(t.CurrentPath, node.ID)
        t.Decisions = append(t.Decisions, *node)
        for _, child := range node.Children {
            t.traverseNode(child, query, ctx, applicable)
        }
    }
}

Each node in the tree specifies:

Condition: When this strategy applies (keyword matching)
Agent types: Which specialized agents to spawn
Priority: Execution order for dependency management
Parameters: Configuration for the agents

3. Parallel agent execution

The coordinator spawns agents in dependency-aware groups, running independent agents in parallel while respecting data dependencies.

func (c *Coordinator) SpawnAgents(ctx context.Context, applicable []*dt.Node) {
    agentConfigs := make(map[string]AgentConfig)

    for _, node := range applicable {
        for _, name := range node.AgentTypes {
            agt, ok := c.lookupAgentType(name)
            if !ok {
                continue
            }
            if existing, exists := agentConfigs[name]; !exists || node.Priority > existing.Priority {
                agentConfigs[name] = AgentConfig{
                    Priority:   node.Priority,
                    Parameters: node.Parameters,
                    AgentType:  agt,
                }
            }
        }
    }

    if len(agentConfigs) == 0 {
        return
    }

    // Plan execution order based on dependencies
    planned := c.scheduler.Plan(agentConfigs)
    verbose := verboseAgents()

    // Execute agents in dependency order
    for _, group := range planned {
        if verbose {
            fmt.Printf("📊 Executing order group %d with %d agents\n", group.Order, len(group.Agents))
        }
        var wg sync.WaitGroup
        for _, cfg := range group.Agents {
            if !c.scheduler.Ready(cfg.AgentType, c.dataBus) {
                if verbose {
                    fmt.Printf("⏸️  Agent %s waiting for dependencies\n", cfg.AgentType.Name)
                }
                continue
            }
            agent := c.newParallelAgent(cfg)
            c.registry.Register(agent)
            wg.Add(1)
            go c.runPlannedAgent(ctx, &wg, agent)
        }
        wg.Wait()
        if verbose {
            fmt.Printf("✅ Order group %d completed\n", group.Order)
        }
    }
}

Agent types

Clanker includes specialized agents for different investigation domains:

Log agent

Discovers log groups, retrieves recent entries, and identifies error patterns.Provides: logs, error_patterns, log_metrics
Execution order: 1 (no dependencies)

Metrics agent

Gathers CloudWatch metrics, performance data, and threshold violations.Provides: metrics, performance_data, thresholds
Execution order: 1 (no dependencies)

Infrastructure agent

Checks service configuration, deployment status, and resource health.Provides: service_config, deployment_status, resource_health
Execution order: 2 (can use metrics/logs if available)

Security agent

Analyzes access patterns, security configurations, and vulnerabilities.Requires: logs, service_config
Provides: security_status, access_patterns, vulnerabilities
Execution order: 3 (depends on logs and infrastructure)

K8s agent

Inspects Kubernetes cluster resources, pod health, and deployment status.Provides: k8s_resources, k8s_health
Execution order: 1 (no dependencies)

Performance agent

Identifies bottlenecks and provides scaling recommendations.Requires: metrics, logs, resource_health
Provides: performance_analysis, bottlenecks, scaling_recommendations
Execution order: 5 (depends on metrics, logs, infrastructure)

See the full list in internal/agent/coordinator/agent_types.go.

Chain of thought tracking

Every agent decision is recorded in a chain of thought for debugging and transparency:

a.addThought(agentCtx, fmt.Sprintf("Starting investigation of query: '%s'", query), "analyze", "Query received, beginning analysis")
a.addThought(agentCtx, fmt.Sprintf("Semantic analysis: Intent=%s, Confidence=%.2f, Urgency=%s",
    queryIntent.Primary, queryIntent.Confidence, queryIntent.Urgency), "analyze", "Performed semantic analysis")

// ...

a.addThought(agentCtx, fmt.Sprintf("Decision tree identified %d applicable strategies", len(applicableNodes)), "analyze", "Determined parallel execution strategy")

// ...

stats := coord.Stats()
a.addThought(agentCtx, fmt.Sprintf("Completed parallel execution with %d agents", stats.Total), "success", "Data gathering completed")

if verbose {
    fmt.Printf("🎉 Parallel execution completed: %d successful, %d failed\n",
        stats.Completed, stats.Failed)
}

Enable agent tracing with --agent-trace or debug: true in your config to see detailed coordinator lifecycle logs.

Performance characteristics

Approach	Time	Requests
Sequential (naive)	6-8 seconds	5-10 serial calls
Parallel agents	2-3 seconds	5-10 concurrent calls

By running independent agents in parallel, Clanker reduces investigation time by 60-70% compared to sequential approaches.

Example execution

Here’s what happens when you ask “Why is my Lambda failing?”:

clanker ask "Why is my Lambda failing?" --agent-trace

Semantic analysis

Intent: troubleshoot (confidence 0.85)
Urgency: high
Target services: lambda, logs
Data types: logs, metrics, status

Decision tree match

Applicable nodes:

lambda_errors (priority 10)
service_health (priority 5)

Agent spawning

Order group 1:

Log agent (discover Lambda log groups, get recent errors)
Metrics agent (get Lambda metrics: invocations, errors, duration)

Order group 2:

Infrastructure agent (get Lambda configuration, environment variables)

Result aggregation

Combine data from all agents into unified context for LLM response.

Next steps

Natural language

Learn how Clanker interprets your questions

Agent architecture

Deep dive into agent implementation

Get Started

Core Concepts

AWS

Kubernetes

Cloud Providers

Integrations

Advanced

Architecture overview

Coordinator

Semantic analyzer

Specialized agents

How it works

1. Semantic analysis

2. Decision tree traversal

3. Parallel agent execution

Agent types

Chain of thought tracking

Performance characteristics

Example execution

Next steps

Natural language

Agent architecture

Build docs developers (and LLMs) love

Get Started

Core Concepts

AWS

Kubernetes

Cloud Providers

Integrations

Advanced

​Architecture overview

Coordinator

Semantic analyzer

Specialized agents

​How it works

​1. Semantic analysis

​2. Decision tree traversal

​3. Parallel agent execution

​Agent types

​Chain of thought tracking

​Performance characteristics

​Example execution

​Next steps

Natural language

Agent architecture

Build docs developers (and LLMs) love

Architecture overview

How it works

1. Semantic analysis

2. Decision tree traversal

3. Parallel agent execution

Agent types

Chain of thought tracking

Performance characteristics

Example execution

Next steps