Skip to main content
5Stack provides comprehensive system monitoring tools that give Administrators real-time visibility into platform health, resource utilization, and service performance. These tools are essential for maintaining optimal platform operation and diagnosing issues.
System monitoring features require Administrator role access.

System Metrics

The system metrics dashboard (/system-metrics) provides real-time monitoring of all platform services and game server nodes.

Overview Statistics

The metrics page displays high-level platform statistics:
<template>
  <PageHeading>
    <template #title>{{ $t("pages.system_metrics.title") }}</template>
    <template #description>
      {{ $t("pages.system_metrics.description") }}
    </template>
    <template #actions>
      <div class="flex flex-wrap items-center gap-3">
        <Badge variant="outline" class="text-xs px-3 py-1">
          {{ $t("pages.system_metrics.services_count") }}:
          {{ totalServices }}
        </Badge>
        <Badge variant="outline" class="text-xs px-3 py-1">
          {{ $t("pages.system_metrics.nodes_count") }}:
          {{ totalGameNodes }}
        </Badge>
      </div>
    </template>
  </PageHeading>
</template>

Game Server Nodes

Monitor all game server nodes with detailed metrics and filtering:

Node Filtering

  • Search by node name, ID, or region
  • Filter by enabled/disabled status
  • Filter by online/offline status
  • Sort by CPU, memory, or name

Node Metrics

  • Real-time CPU usage percentage
  • Memory utilization tracking
  • Online/offline status monitoring
  • Regional distribution view

Node Metrics Query

The platform polls for game server node data every 30 seconds:
apollo: {
  game_server_nodes: {
    query: generateQuery({
      game_server_nodes: [
        {},
        {
          id: true,
          label: true,
          region: true,
          enabled: true,
          offline_at: true,
        },
      ],
    }),
    pollInterval: 30 * 1000,
  },
}

Node Filtering Logic

const filteredNodes = computed(() => {
  if (!game_server_nodes) return [];
  
  const term = nodeSearchTerm.trim().toLowerCase();
  const filtered = game_server_nodes.filter((node: any) => {
    if (
      term &&
      !`${node.label || ""} ${node.id} ${node.region || ""}`
        .toLowerCase()
        .includes(term)
    ) {
      return false;
    }
    if (onlyEnabledNodes && !node.enabled) {
      return false;
    }
    if (onlyOnlineNodes && node.offline_at) {
      return false;
    }
    return true;
  });
  
  return filtered;
});

Service Monitoring

Track resource usage for all platform services:
1

Service Discovery

All running services are automatically discovered and monitored:
  • api
  • web
  • game-server-node
  • hasura
  • typesense
  • timescaledb
  • redis
  • minio
2

Metric Collection

CPU and memory metrics are collected continuously:
getServiceStats: {
  query: generateQuery({
    getServiceStats: [
      {},
      {
        node: true,
        name: true,
        cpu: [
          {},
          {
            time: true,
            total: true,
            used: true,
            window: true,
          },
        ],
        memory: [
          {},
          {
            time: true,
            total: true,
            used: true,
          },
        ],
      },
    ],
  }),
  pollInterval: 30 * 1000,
}
3

Status Detection

The system automatically detects and highlights services with elevated resource usage:
function serviceCpuStatus(service: any): "normal" | "warning" | "critical" {
  const cpu = latestCpuUsage(service);
  if (cpu >= 90) return "critical";
  if (cpu >= 75) return "warning";
  return "normal";
}

CPU Usage Calculation

CPU usage is calculated from nanocores to percentage:
function latestCpuUsage(service: any): number {
  if (!service.cpu || !service.cpu.length) {
    return 0;
  }
  const last = service.cpu[service.cpu.length - 1];
  if (!last || !last.total || !last.used) {
    return 0;
  }
  // used is nanocores, total is number of CPUs
  const coresUsed = last.used / 1_000_000_000;
  const usedPercent = (coresUsed * 100) / last.total;
  return Math.round(Math.min(100, Math.max(0, usedPercent)));
}

Memory Usage Calculation

function latestMemoryUsage(service: any): number {
  if (!service.memory || !service.memory.length) {
    return 0;
  }
  const last = service.memory[service.memory.length - 1];
  if (!last || !last.total) {
    return 0;
  }
  const usedPercent = (last.used / last.total) * 100;
  return Math.round(Math.min(100, Math.max(0, usedPercent)));
}

Service Filtering and Sorting

Administrators can filter and sort services to focus on specific concerns:
const filteredServices = computed(() => {
  if (!getServiceStats) return [];
  
  const term = serviceSearchTerm.trim().toLowerCase();
  const filtered = getServiceStats.filter((service: any) => {
    if (!hasServiceMetrics(service)) return false;
    
    if (
      term &&
      !`${service.name} ${service.node}`.toLowerCase().includes(term)
    ) {
      return false;
    }
    
    if (
      selectedServiceNode !== "__all" &&
      service.node !== selectedServiceNode
    ) {
      return false;
    }
    
    return true;
  });
  
  // Sort by CPU, memory, or name
  const services = [...filtered];
  const directionFactor = serviceSortDirection === "asc" ? 1 : -1;
  
  services.sort((a: any, b: any) => {
    let valA: number | string = 0;
    let valB: number | string = 0;
    
    if (serviceSortBy === "cpu") {
      valA = latestCpuUsage(a);
      valB = latestCpuUsage(b);
    } else if (serviceSortBy === "memory") {
      valA = latestMemoryUsage(a);
      valB = latestMemoryUsage(b);
    } else if (serviceSortBy === "name") {
      valA = (a.name || "") as string;
      valB = (b.name || "") as string;
    }
    
    if (typeof valA === "string" && typeof valB === "string") {
      return directionFactor * valA.localeCompare(valB);
    }
    
    const numA = typeof valA === "number" ? valA : 0;
    const numB = typeof valB === "number" ? valB : 0;
    if (numA === numB) return 0;
    return directionFactor * (numA < numB ? -1 : 1);
  });
  
  return services;
});

System Logs

The system logs page (/system-logs) provides real-time access to service logs.

Available Services

Logs are available for all platform services:
const services = [
  'api',
  'web',
  'game-server-node',
  'hasura',
  'typesense',
  'timescaledb',
  'redis',
  'minio',
];

Log Features

Follow Logs

Enable “Follow Logs” to automatically scroll to new log entries as they appear, similar to tail -f.

Timestamps

Toggle timestamp display to show or hide log entry timestamps for cleaner viewing.

Log Interface

<template>
  <Tabs v-model="activeService" default-value="api" orientation="vertical">
    <div class="flex items-center justify-between flex-col lg:flex-row">
      <TabsList class="lg:inline-flex grid grid-cols-1 w-full lg:w-fit">
        <TabsTrigger
          class="capitalize"
          v-for="service in services"
          :key="service"
          :value="service"
        >
          {{ service }}
        </TabsTrigger>
      </TabsList>

      <div class="flex items-center gap-4">
        <div class="flex items-center gap-2">
          <Switch
            :model-value="followLogs"
            @click="followLogs = !followLogs"
          />
          {{ $t("pages.system_logs.follow_logs") }}
        </div>

        <div class="flex items-center gap-2">
          <Switch
            :model-value="timestamps"
            @click="timestamps = !timestamps"
          />
          {{ $t("pages.system_logs.timestamps") }}
        </div>
      </div>
    </div>

    <TabsContent :key="activeService" :value="activeService">
      <ServiceLogs
        :service="activeService"
        :timestamps="timestamps"
        :follow-logs="followLogs"
        @follow-logs-changed="(value: boolean) => (followLogs = value)"
      />
    </TabsContent>
  </Tabs>
</template>

Service Query Parameters

You can link directly to specific service logs using query parameters:
function syncServiceFromRoute() {
  const service = $route?.query?.service as string | undefined;
  if (service && services.includes(service)) {
    activeService = service;
  }
}
Example: /system-logs?service=api will open the API service logs.

Linking to Logs

From the metrics page, you can quickly jump to service logs:
<Button
  variant="ghost"
  size="icon"
  @click="
    $router.push({
      path: '/system-logs',
      query: { service: service.name },
    })
  "
>
  <Logs class="h-4 w-4" />
</Button>

Monitoring Best Practices

Monitor your services during normal operation to understand typical resource usage patterns. This helps identify anomalies quickly.
Review system metrics daily to catch gradual performance degradation before it impacts users.
Ensure game server nodes are distributed appropriately across regions to provide optimal latency for all players.
When investigating issues, correlate metrics with logs. High CPU usage in metrics should align with activity in logs.
Use trending metrics to predict when additional resources or nodes will be needed, rather than reacting to issues.
Remember service dependencies when troubleshooting. Issues in timescaledb may manifest as problems in api or hasura.

Performance Thresholds

Critical Thresholds:
  • CPU usage ≥ 90%: Critical performance degradation likely
  • CPU usage ≥ 75%: Warning level, monitor closely
  • Memory usage ≥ 95%: Risk of service crashes
  • Node offline: Matches on that node will fail

Troubleshooting Common Issues

High CPU Usage

  1. Check logs for the affected service
  2. Identify any long-running operations
  3. Review recent deployments or configuration changes
  4. Consider scaling horizontally if sustained

High Memory Usage

  1. Check for memory leaks in application logs
  2. Review cache sizes (Redis)
  3. Check database connection pools
  4. Restart service if memory leak is suspected

Node Offline

  1. Check network connectivity
  2. Verify node service is running
  3. Review node logs for crash reasons
  4. Check hardware resources on the node

Service Not Responding

  1. Check if service is visible in metrics
  2. Review service logs for errors
  3. Verify dependent services are operational
  4. Check network connectivity between services

Database Management

Monitor and optimize database performance

Game Server Nodes

Configure and manage game server infrastructure

Roles & Permissions

Understand administrator permissions

Build docs developers (and LLMs) love