System Monitoring

5Stack provides comprehensive system monitoring tools that give Administrators real-time visibility into platform health, resource utilization, and service performance. These tools are essential for maintaining optimal platform operation and diagnosing issues.

System monitoring features require Administrator role access.

System Metrics

The system metrics dashboard (/system-metrics) provides real-time monitoring of all platform services and game server nodes.

Overview Statistics

The metrics page displays high-level platform statistics:

<template>
  <PageHeading>
    <template #title>{{ $t("pages.system_metrics.title") }}</template>
    <template #description>
      {{ $t("pages.system_metrics.description") }}
    </template>
    <template #actions>
      <div class="flex flex-wrap items-center gap-3">
        <Badge variant="outline" class="text-xs px-3 py-1">
          {{ $t("pages.system_metrics.services_count") }}:
          {{ totalServices }}
        </Badge>
        <Badge variant="outline" class="text-xs px-3 py-1">
          {{ $t("pages.system_metrics.nodes_count") }}:
          {{ totalGameNodes }}
        </Badge>
      </div>
    </template>
  </PageHeading>
</template>

Game Server Nodes

Monitor all game server nodes with detailed metrics and filtering:

Node Filtering

Search by node name, ID, or region
Filter by enabled/disabled status
Filter by online/offline status
Sort by CPU, memory, or name

Node Metrics

Real-time CPU usage percentage
Memory utilization tracking
Online/offline status monitoring
Regional distribution view

Node Metrics Query

The platform polls for game server node data every 30 seconds:

apollo: {
  game_server_nodes: {
    query: generateQuery({
      game_server_nodes: [
        {},
        {
          id: true,
          label: true,
          region: true,
          enabled: true,
          offline_at: true,
        },
      ],
    }),
    pollInterval: 30 * 1000,
  },
}

Node Filtering Logic

const filteredNodes = computed(() => {
  if (!game_server_nodes) return [];
  
  const term = nodeSearchTerm.trim().toLowerCase();
  const filtered = game_server_nodes.filter((node: any) => {
    if (
      term &&
      !`${node.label || ""} ${node.id} ${node.region || ""}`
        .toLowerCase()
        .includes(term)
    ) {
      return false;
    }
    if (onlyEnabledNodes && !node.enabled) {
      return false;
    }
    if (onlyOnlineNodes && node.offline_at) {
      return false;
    }
    return true;
  });
  
  return filtered;
});

Service Monitoring

Track resource usage for all platform services:

Service Discovery

All running services are automatically discovered and monitored:

api
web
game-server-node
hasura
typesense
timescaledb
redis
minio

Metric Collection

CPU and memory metrics are collected continuously:

getServiceStats: {
  query: generateQuery({
    getServiceStats: [
      {},
      {
        node: true,
        name: true,
        cpu: [
          {},
          {
            time: true,
            total: true,
            used: true,
            window: true,
          },
        ],
        memory: [
          {},
          {
            time: true,
            total: true,
            used: true,
          },
        ],
      },
    ],
  }),
  pollInterval: 30 * 1000,
}

Status Detection

The system automatically detects and highlights services with elevated resource usage:

function serviceCpuStatus(service: any): "normal" | "warning" | "critical" {
  const cpu = latestCpuUsage(service);
  if (cpu >= 90) return "critical";
  if (cpu >= 75) return "warning";
  return "normal";
}

CPU Usage Calculation

CPU usage is calculated from nanocores to percentage:

function latestCpuUsage(service: any): number {
  if (!service.cpu || !service.cpu.length) {
    return 0;
  }
  const last = service.cpu[service.cpu.length - 1];
  if (!last || !last.total || !last.used) {
    return 0;
  }
  // used is nanocores, total is number of CPUs
  const coresUsed = last.used / 1_000_000_000;
  const usedPercent = (coresUsed * 100) / last.total;
  return Math.round(Math.min(100, Math.max(0, usedPercent)));
}

Memory Usage Calculation

function latestMemoryUsage(service: any): number {
  if (!service.memory || !service.memory.length) {
    return 0;
  }
  const last = service.memory[service.memory.length - 1];
  if (!last || !last.total) {
    return 0;
  }
  const usedPercent = (last.used / last.total) * 100;
  return Math.round(Math.min(100, Math.max(0, usedPercent)));
}

Service Filtering and Sorting

Administrators can filter and sort services to focus on specific concerns:

const filteredServices = computed(() => {
  if (!getServiceStats) return [];
  
  const term = serviceSearchTerm.trim().toLowerCase();
  const filtered = getServiceStats.filter((service: any) => {
    if (!hasServiceMetrics(service)) return false;
    
    if (
      term &&
      !`${service.name} ${service.node}`.toLowerCase().includes(term)
    ) {
      return false;
    }
    
    if (
      selectedServiceNode !== "__all" &&
      service.node !== selectedServiceNode
    ) {
      return false;
    }
    
    return true;
  });
  
  // Sort by CPU, memory, or name
  const services = [...filtered];
  const directionFactor = serviceSortDirection === "asc" ? 1 : -1;
  
  services.sort((a: any, b: any) => {
    let valA: number | string = 0;
    let valB: number | string = 0;
    
    if (serviceSortBy === "cpu") {
      valA = latestCpuUsage(a);
      valB = latestCpuUsage(b);
    } else if (serviceSortBy === "memory") {
      valA = latestMemoryUsage(a);
      valB = latestMemoryUsage(b);
    } else if (serviceSortBy === "name") {
      valA = (a.name || "") as string;
      valB = (b.name || "") as string;
    }
    
    if (typeof valA === "string" && typeof valB === "string") {
      return directionFactor * valA.localeCompare(valB);
    }
    
    const numA = typeof valA === "number" ? valA : 0;
    const numB = typeof valB === "number" ? valB : 0;
    if (numA === numB) return 0;
    return directionFactor * (numA < numB ? -1 : 1);
  });
  
  return services;
});

System Logs

The system logs page (/system-logs) provides real-time access to service logs.

Available Services

Logs are available for all platform services:

const services = [
  'api',
  'web',
  'game-server-node',
  'hasura',
  'typesense',
  'timescaledb',
  'redis',
  'minio',
];

Log Features

Follow Logs

Enable “Follow Logs” to automatically scroll to new log entries as they appear, similar to tail -f.

Timestamps

Toggle timestamp display to show or hide log entry timestamps for cleaner viewing.

Log Interface

<template>
  <Tabs v-model="activeService" default-value="api" orientation="vertical">
    <div class="flex items-center justify-between flex-col lg:flex-row">
      <TabsList class="lg:inline-flex grid grid-cols-1 w-full lg:w-fit">
        <TabsTrigger
          class="capitalize"
          v-for="service in services"
          :key="service"
          :value="service"
        >
          {{ service }}
        </TabsTrigger>
      </TabsList>

      <div class="flex items-center gap-4">
        <div class="flex items-center gap-2">
          <Switch
            :model-value="followLogs"
            @click="followLogs = !followLogs"
          />
          {{ $t("pages.system_logs.follow_logs") }}
        </div>

        <div class="flex items-center gap-2">
          <Switch
            :model-value="timestamps"
            @click="timestamps = !timestamps"
          />
          {{ $t("pages.system_logs.timestamps") }}
        </div>
      </div>
    </div>

    <TabsContent :key="activeService" :value="activeService">
      <ServiceLogs
        :service="activeService"
        :timestamps="timestamps"
        :follow-logs="followLogs"
        @follow-logs-changed="(value: boolean) => (followLogs = value)"
      />
    </TabsContent>
  </Tabs>
</template>

Service Query Parameters

You can link directly to specific service logs using query parameters:

function syncServiceFromRoute() {
  const service = $route?.query?.service as string | undefined;
  if (service && services.includes(service)) {
    activeService = service;
  }
}

Example: /system-logs?service=api will open the API service logs.

Linking to Logs

From the metrics page, you can quickly jump to service logs:

<Button
  variant="ghost"
  size="icon"
  @click="
    $router.push({
      path: '/system-logs',
      query: { service: service.name },
    })
  "
>
  <Logs class="h-4 w-4" />
</Button>

Monitoring Best Practices

Establish Baselines

Monitor your services during normal operation to understand typical resource usage patterns. This helps identify anomalies quickly.

Regular Health Checks

Review system metrics daily to catch gradual performance degradation before it impacts users.

Node Distribution

Ensure game server nodes are distributed appropriately across regions to provide optimal latency for all players.

Log Correlation

When investigating issues, correlate metrics with logs. High CPU usage in metrics should align with activity in logs.

Proactive Scaling

Use trending metrics to predict when additional resources or nodes will be needed, rather than reacting to issues.

Service Dependencies

Remember service dependencies when troubleshooting. Issues in timescaledb may manifest as problems in api or hasura.

Performance Thresholds

Critical Thresholds:

CPU usage ≥ 90%: Critical performance degradation likely
CPU usage ≥ 75%: Warning level, monitor closely
Memory usage ≥ 95%: Risk of service crashes
Node offline: Matches on that node will fail

Troubleshooting Common Issues

High CPU Usage

Check logs for the affected service
Identify any long-running operations
Review recent deployments or configuration changes
Consider scaling horizontally if sustained

High Memory Usage

Check for memory leaks in application logs
Review cache sizes (Redis)
Check database connection pools
Restart service if memory leak is suspected

Node Offline

Check network connectivity
Verify node service is running
Review node logs for crash reasons
Check hardware resources on the node

Service Not Responding

Check if service is visible in metrics
Review service logs for errors
Verify dependent services are operational
Check network connectivity between services

Database Management

Monitor and optimize database performance

Game Server Nodes

Configure and manage game server infrastructure

Roles & Permissions

Understand administrator permissions

Get Started

User Guide

Matches

Tournaments

Server Management

Administration

System Monitoring

System Metrics

Overview Statistics

Game Server Nodes

Node Filtering

Node Metrics

Node Metrics Query

Node Filtering Logic

Service Monitoring

CPU Usage Calculation

Memory Usage Calculation

Service Filtering and Sorting

System Logs

Available Services

Log Features

Follow Logs

Timestamps

Log Interface

Service Query Parameters

Linking to Logs

Monitoring Best Practices

Performance Thresholds

Troubleshooting Common Issues

High CPU Usage

High Memory Usage

Node Offline

Service Not Responding

Database Management

Game Server Nodes

Roles & Permissions

Build docs developers (and LLMs) love

Get Started

User Guide

Matches

Tournaments

Server Management

Administration

​System Metrics

​Overview Statistics

​Game Server Nodes

Node Filtering

Node Metrics

​Node Metrics Query

​Node Filtering Logic

​Service Monitoring

​CPU Usage Calculation

​Memory Usage Calculation

​Service Filtering and Sorting

​System Logs

​Available Services

​Log Features

Follow Logs

Timestamps

​Log Interface

​Service Query Parameters

​Linking to Logs

​Monitoring Best Practices

​Performance Thresholds

​Troubleshooting Common Issues

​High CPU Usage

​High Memory Usage

​Node Offline

​Service Not Responding

​Related Topics

Database Management

Game Server Nodes

Roles & Permissions

Build docs developers (and LLMs) love

System Metrics

Overview Statistics

Game Server Nodes

Node Metrics Query

Node Filtering Logic

Service Monitoring

CPU Usage Calculation

Memory Usage Calculation

Service Filtering and Sorting

System Logs

Available Services

Log Features

Log Interface

Service Query Parameters

Linking to Logs

Monitoring Best Practices

Performance Thresholds

Troubleshooting Common Issues

High CPU Usage

High Memory Usage

Node Offline

Service Not Responding

Related Topics