Skip to main content

Introduction

Talos Linux exposes a comprehensive gRPC API for all system operations. The API provides complete control over node lifecycle, configuration, monitoring, and cluster management. All talosctl commands interact with Talos nodes through this gRPC API.

API Services

The Talos API is organized into three main services:

MachineService

The primary service for node operations including:
  • Configuration management
  • System lifecycle (reboot, shutdown, upgrade)
  • Container and process management
  • System monitoring and stats
  • etcd cluster management
  • File operations
View MachineService documentation

ClusterService

Cluster-wide operations:
  • Health checks across multiple nodes
  • Cluster validation
View ClusterService documentation

InspectService

Internal inspection and debugging:
  • Controller runtime dependencies
  • Resource graphs
View InspectService documentation

Authentication

Talos uses mutual TLS (mTLS) for API authentication. Each API request must include a valid client certificate signed by the Talos CA.

Client Certificates

Client certificates are generated during cluster bootstrap and stored in the talosconfig file. The certificate includes:
  • Subject: Identifies the client
  • Roles: Defines permissions (os:admin, os:reader, etc.)
  • TTL: Certificate validity period (default: 365 days)

Generating Client Certificates

You can generate additional client certificates using the API:
import (
    "github.com/siderolabs/talos/pkg/machinery/api/machine"
    "google.golang.org/protobuf/types/known/durationpb"
)

client.GenerateClientConfiguration(ctx, &machine.GenerateClientConfigurationRequest{
    Roles:  []string{"os:admin"},
    CrtTtl: durationpb.New(24 * time.Hour),
})

Connection

Endpoints

The API is exposed on port 50000 by default. When connecting to a cluster, you can target:
  • Specific node: 10.0.0.1:50000
  • Control plane endpoint: Use the cluster endpoint from talosconfig
  • Load balanced: Through a load balancer (recommended for production)

Transport Security

All API communication uses TLS 1.3 with strong cipher suites:
  • TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
  • TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
  • TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
  • TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305

Client Libraries

Official Go Client

The official Talos client library is written in Go:
import (
    "github.com/siderolabs/talos/pkg/machinery/client"
)

c, err := client.New(ctx,
    client.WithEndpoints("10.0.0.1"),
    client.WithTLSConfig(tlsConfig),
)
if err != nil {
    return err
}

defer c.Close()

// Call API methods
resp, err := c.Version(ctx)

Using talosctl as a Client

The talosctl CLI tool is built on the Go client library and can be used as a reference implementation:
# Get version
talosctl -n 10.0.0.1 version

# Apply configuration
talosctl -n 10.0.0.1 apply-config --file config.yaml

# Stream logs
talosctl -n 10.0.0.1 logs kubelet

Building Custom Clients

You can build clients in any language that supports gRPC:
  1. Get the proto definitions: Clone the Talos repository
  2. Generate code: Use protoc with your language plugin
  3. Implement authentication: Load client certificate and CA
  4. Create gRPC channel: Connect with TLS credentials
# Python example
import grpc
from machine_pb2_grpc import MachineServiceStub
from google.protobuf import empty_pb2

# Load certificates
with open('client.crt', 'rb') as f:
    client_cert = f.read()
with open('client.key', 'rb') as f:
    client_key = f.read()
with open('ca.crt', 'rb') as f:
    ca_cert = f.read()

# Create credentials
creds = grpc.ssl_channel_credentials(
    root_certificates=ca_cert,
    private_key=client_key,
    certificate_chain=client_cert
)

# Connect
with grpc.secure_channel('10.0.0.1:50000', creds) as channel:
    stub = MachineServiceStub(channel)
    response = stub.Version(empty_pb2.Empty())
    print(response)

Request/Response Patterns

Unary RPCs

Most API methods use unary request-response:
rpc Version(google.protobuf.Empty) returns (VersionResponse);

Server Streaming

Some methods stream data back to the client:
rpc Logs(LogsRequest) returns (stream common.Data);
rpc Events(EventsRequest) returns (stream Event);

Client Streaming

Upload operations use client streaming:
rpc EtcdRecover(stream common.Data) returns (EtcdRecoverResponse);

Common Types

All API responses include metadata and common types. See Common Types for details.

Error Handling

gRPC Status Codes

The API uses standard gRPC status codes:
  • OK (0): Success
  • CANCELLED (1): Operation cancelled
  • INVALID_ARGUMENT (3): Invalid request parameters
  • DEADLINE_EXCEEDED (4): Request timeout
  • NOT_FOUND (5): Resource not found
  • PERMISSION_DENIED (7): Insufficient permissions
  • UNAVAILABLE (14): Service unavailable

Error Details

Errors include additional context in the metadata:
{
  "metadata": {
    "hostname": "worker-1",
    "error": "service kubelet is not running",
    "status": {
      "code": 5,
      "message": "not found"
    }
  }
}

Multi-Node Requests

Many API calls can target multiple nodes simultaneously:
# Target multiple nodes
talosctl -n 10.0.0.1,10.0.0.2,10.0.0.3 version

# Use cluster endpoint (targets all control plane nodes)
talosctl version
Responses include metadata identifying which node responded:
message Metadata {
  string hostname = 1;
  string error = 2;
  google.rpc.Status status = 3;
}

API Versioning

The Talos API follows semantic versioning:
  • Major version: Breaking changes (reflected in proto package)
  • Minor version: Backward-compatible additions
  • Patch version: Backward-compatible fixes

Deprecation Policy

Deprecated methods include annotations indicating when they will be removed:
rpc ImageList(ImageListRequest) returns (stream ImageListResponse) {
  option (common.remove_deprecated_method) = "v1.18";
  option deprecated = true;
}

Rate Limiting

The API does not enforce rate limiting, but clients should:
  • Implement exponential backoff on errors
  • Avoid polling; use streaming RPCs where available
  • Batch operations when possible
  • Respect UNAVAILABLE status codes

Best Practices

Use Streaming for Real-Time Data

For logs, events, and monitoring data, use streaming RPCs instead of polling:
// Good: Use streaming
stream, err := c.Events(ctx, &machine.EventsRequest{TailEvents: 10})
for {
    event, err := stream.Recv()
    if err != nil {
        break
    }
    // Process event
}

// Bad: Don't poll
for {
    events, _ := c.GetEvents(ctx) // This doesn't exist
    time.Sleep(1 * time.Second)
}

Handle Partial Failures

When targeting multiple nodes, some may fail:
resp, err := c.Version(ctx)
for _, msg := range resp.Messages {
    if msg.Metadata.Error != "" {
        log.Printf("Node %s failed: %s", msg.Metadata.Hostname, msg.Metadata.Error)
        continue
    }
    // Process successful response
}

Set Appropriate Timeouts

Different operations have different time requirements:
// Quick operations: 30 seconds
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
c.Version(ctx)

// Long operations: 10 minutes
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
defer cancel()
c.Upgrade(ctx, &machine.UpgradeRequest{Image: "ghcr.io/siderolabs/talos:v1.7.0"})

Next Steps

Build docs developers (and LLMs) love