Performance Data Collection

Overview

NMIS collects and stores performance metrics as time-series data using RRDtool (Round Robin Database). This provides efficient storage with automatic data aging and aggregation, enabling both real-time monitoring and historical trend analysis.

RRD Storage

Efficient circular buffer storage with fixed size and automatic aggregation

Data Resolution

Multiple resolution levels from 5-minute to yearly averages

Polling Engine

Automated collection cycles with configurable intervals

Graphing

Built-in graphing of collected metrics with customizable views

RRDtool Storage Architecture

Round Robin Database Concept

RRD files use a circular buffer structure that:

Fixed Size: Never grows beyond configured size
Automatic Aging: Oldest data automatically overwritten
Consolidation: High-resolution data aggregated over time
Efficiency: O(1) read/write operations regardless of data age

# From rrdfunc.pm:29-68
package NMISNG::rrdfunc;

our $VERSION = "9.5.1";

use Statistics::Lite;
use POSIX qw();  # for strftime

# Initialize RRDs module
sub require_RRDs {
    my (%args) = @_;
    state $RRD_included = 0;
    
    if( !$RRD_included ) {
        $RRD_included = 1;
        require RRDs;
        RRDs->import;
    }
}

# Track module errors
my $_last_error;
sub getRRDerror {
    return $_last_error;
}

Data Storage Locations

RRD files are organized by node and data type:

# Default RRD storage structure
/var/nmis9/database/
├── nodes/
│   ├── router1/
│   │   ├── health/
│   │   │   ├── cpu.rrd
│   │   │   ├── memory.rrd
│   │   │   └── health.rrd
│   │   ├── interface/
│   │   │   ├── GigabitEthernet0_0-ifInOctets.rrd
│   │   │   ├── GigabitEthernet0_0-ifOutOctets.rrd
│   │   │   └── GigabitEthernet0_1-ifInOctets.rrd
│   │   └── pkts/
│   │       └── GigabitEthernet0_0-pkts.rrd
│   └── switch1/
│       └── ...

Data Collection Process

Polling Cycles

NMIS operates on multiple collection cycles: Update Cycle (default 5 minutes):

System information (uptime, name, location)
Catchall data (global device metrics)
Interface discovery

Collect Cycle (default 5 minutes):

Interface statistics (octets, errors, discards)
CPU and memory utilization
Environmental sensors
Protocol statistics

Services Cycle (configurable):

Service availability checks
Response time measurements

Polling intervals can be customized per node or group using polling policies. Critical devices may poll every 1-2 minutes, while less important ones every 15-30 minutes.

Data Source Types

RRD databases contain multiple data sources (DS), each with a type:

GAUGE

DS type

Value that can increase or decrease (CPU %, temperature, voltage)

COUNTER

DS type

Ever-increasing counter that wraps (interface octets, packet counts)

DERIVE

DS type

Counter that can increase or decrease (signed counter)

ABSOLUTE

DS type

Counter that resets to zero after each read

Round Robin Archives (RRAs)

Each RRD file contains multiple archives with different resolutions:

# From rrdfunc.pm:70-130
sub getRRDasHash {
    my %args = @_;
    my $db = $args{database};
    
    my $minhr = (defined $args{hour_from}? $args{hour_from} : 0);
    my $maxhr = (defined $args{hour_to}? $args{hour_to} :  24) ;
    my $wantedresolution = $args{resolution};
    
    my @rrdargs = ($db, $args{mode});
    my ($bucketsize, $resolution);
    
    if (defined($wantedresolution) && $wantedresolution > 0) {
        # Determine native resolutions available
        my ($error, @available) = getRRDResolutions($db, $args{mode});
        return ({},[], { error => $error }) if ($error);
        
        # Match desired resolution to available RRAs
        if (grep($_ == $wantedresolution, @available)) {
            $resolution = $wantedresolution;
        }
        elsif ( $wantedresolution % $available[0] == 0) {
            # Bucketize ourselves
            $bucketsize = $wantedresolution / $available[0];
            $resolution = $available[0];
        }
    }
}

Standard NMIS RRA Configuration:

Archive	Step	Rows	Period	Consolidation
RRA:AVERAGE	5 min	576	2 days	Average of raw samples
RRA:AVERAGE	30 min	672	2 weeks	Average of 6 samples
RRA:AVERAGE	2 hours	744	2 months	Average of 24 samples
RRA:AVERAGE	1 day	1460	4 years	Average of 288 samples

Data Collection Configuration

Polling Intervals

# Configure node polling interval
nmis-cli act=set node=router1 poll_interval=300

# Set faster polling for critical device
nmis-cli act=set node=core-switch1 poll_interval=60

# Slower polling for edge devices
nmis-cli act=set node=remote-router poll_interval=900

Data Retention

Retention is controlled by RRA configuration in model files:

{
  "database": {
    "type": {
      "ifInOctets": {
        "rrd": {
          "step": 300,
          "heartbeat": 900,
          "ds": {
            "ifInOctets": {
              "type": "COUNTER",
              "min": "0",
              "max": "U"
            }
          },
          "rra": [
            { "cf": "AVERAGE", "xff": 0.5, "steps": 1, "rows": 576 },
            { "cf": "AVERAGE", "xff": 0.5, "steps": 6, "rows": 672 },
            { "cf": "AVERAGE", "xff": 0.5, "steps": 24, "rows": 744 },
            { "cf": "AVERAGE", "xff": 0.5, "steps": 288, "rows": 1460 }
          ]
        }
      }
    }
  }
}

RRA Parameters:

CF: Consolidation Function (AVERAGE, MIN, MAX, LAST)
XFF: X-Files Factor (0.5 = max 50% unknown data)
Steps: Number of intervals to consolidate
Rows: Number of consolidated rows to keep

Retrieving Performance Data

Using rrdfunc Module

# From rrdfunc.pm:70-200
sub getRRDasHash {
    my %args = @_;
    my $db = $args{database};
    
    return ({},[], { error => "database required!"}) 
        if (!$db or !-f $db);
    
    my @rrdargs = ($db, $args{mode});
    push @rrdargs, ("--start",$args{start},"--end",$args{end});
    
    my ($begin,$step,$name,$data) = RRDs::fetch(@rrdargs);
    my @dsnames = @$name if (defined $name);
    my %s;
    my $time = $begin;
    my $rowswithdata;
    
    # Loop over readings over time
    for(my $row = 0; $row <= $#{$data}; ++$row, $time += $step) {
        my $thisrow = $data->[$row];
        my $datapresent;
        
        # Loop over datasets per reading
        for(my $dsidx = 0; $dsidx <= $#{$thisrow}; ++$dsidx) {
            $s{$time}->{ $dsnames[$dsidx] } = $thisrow->[$dsidx];
            $datapresent ||= 1 if (defined $thisrow->[$dsidx]);
        }
        
        ++$rowswithdata if ($datapresent);
    }
    
    return (\%s, \@dsnames, { 
        step => $step, 
        start => $begin, 
        end => $time,
        rows => scalar @$data, 
        rows_with_data => $rowswithdata 
    });
}

CLI Data Retrieval

# Get latest values
nmis-cli act=get-rrd node=router1 inventory=cpu

# Get data for time range
nmis-cli act=get-rrd node=router1 \
  inventory=interface \
  index=GigabitEthernet0/0 \
  start="-1day" \
  end="now"

# Export to CSV
nmis-cli act=export-rrd node=router1 \
  inventory=cpu \
  start="-1week" \
  format=csv > cpu-data.csv

RRD File Management

Creating RRD Files

RRD files are automatically created during first data collection based on model definitions. Manual creation:

# Create RRD from command line
rrdtool create /path/to/file.rrd \
  --step 300 \
  DS:metric:GAUGE:900:0:U \
  RRA:AVERAGE:0.5:1:576 \
  RRA:AVERAGE:0.5:6:672 \
  RRA:AVERAGE:0.5:24:744 \
  RRA:AVERAGE:0.5:288:1460

Updating RRD Files

# Manual RRD update
rrdtool update /path/to/file.rrd N:value

# Update with specific timestamp
rrdtool update /path/to/file.rrd 1234567890:42

RRD File Information

# View RRD structure
rrdtool info /var/nmis9/database/nodes/router1/health/cpu.rrd

# Check last update time
rrdtool lastupdate /var/nmis9/database/nodes/router1/health/cpu.rrd

# Dump RRD data to XML
rrdtool dump /path/to/file.rrd > file.xml

# Restore from XML
rrdtool restore file.xml /path/to/file.rrd

Resizing RRD Files

Change retention periods by resizing RRAs:

# Use NMIS resize script
/usr/local/nmis9/admin/rrd_resize.pl \
  node=router1 \
  type=health \
  file=cpu.rrd \
  archive=2 \
  rows=1488

# Or use rrdtool directly
rrdtool resize /path/to/file.rrd 2 GROW 100
rrdtool resize /path/to/file.rrd 2 SHRINK 100

Performance Monitoring

Common Metrics Collected

Interface Metrics:

ifInOctets / ifOutOctets (traffic volume)
ifInUcastPkts / ifOutUcastPkts (packets)
ifInErrors / ifOutErrors
ifInDiscards / ifOutDiscards

System Health:

avgBusy (CPU utilization)
MemoryUsedPROC / MemoryFreePROC
bufferFail, bufferSwap (buffer statistics)

Environmental:

Temperature sensors
Fan speeds
Power supply status
Voltage levels

Application:

CBQoS (Cisco class-based QoS)
IP SLA metrics
VPN statistics
Call Manager metrics

Data Aggregation

NMIS automatically aggregates high-resolution data:

Raw (5 min) --> 30 min avg --> 2 hour avg --> 1 day avg
  576 rows       672 rows       744 rows      1460 rows
   (2 days)      (2 weeks)      (2 months)    (4 years)

Aggregation functions:

AVERAGE: Mean of values in period
MIN: Minimum value in period
MAX: Maximum value in period
LAST: Most recent value in period

Graphing Performance Data

Graph Types

Interface Graphs:

Traffic (bits/octets per second)
Packets per second
Errors and discards
Utilization percentage

Health Graphs:

CPU utilization over time
Memory usage trends
Buffer statistics
Response time

Custom Graphs:

Defined in model files
Support for VDEF, CDEF operations
Multiple data sources
Stacked and overlaid views

Accessing Graphs

# Generate graph via CLI
nmis-cli act=graph node=router1 \
  graph=cpu \
  start="-1day" \
  end="now" \
  output=/tmp/cpu-graph.png

# Generate multiple graphs
nmis-cli act=graph-all node=router1 \
  start="-1week" \
  outputdir=/tmp/graphs/

Data Export and Analysis

Export Formats

# Export to CSV
rrdtool xport \
  --start -1day \
  --end now \
  --step 300 \
  DEF:metric=/path/to/file.rrd:ds:AVERAGE \
  XPORT:metric:"Metric Name" \
  > data.csv

# Export using NMIS
nmis-cli act=export-data node=router1 \
  type=interface \
  format=csv \
  start="-1month" \
  > interface-data.csv

Statistical Analysis

# Get statistics from RRD
rrdtool graph /dev/null \
  --start -1day --end now \
  DEF:metric=/path/to/file.rrd:ds:AVERAGE \
  VDEF:avg=metric,AVERAGE \
  VDEF:min=metric,MINIMUM \
  VDEF:max=metric,MAXIMUM \
  PRINT:avg:"Average: %6.2lf" \
  PRINT:min:"Minimum: %6.2lf" \
  PRINT:max:"Maximum: %6.2lf"

Troubleshooting

Missing Data

# Check last update
rrdtool lastupdate /path/to/file.rrd

# Verify file integrity
rrdtool info /path/to/file.rrd | grep last_update

# Check for unknown data
rrdtool fetch /path/to/file.rrd AVERAGE -s -1hour | grep nan

RRD Errors

“illegal attempt to update using time X”

Trying to update with timestamp older than last update
Fix: Ensure system time is correct

“expected X data source readings”

Wrong number of values provided
Fix: Match update to DS count in RRD

“unknown RRA”

Requesting non-existent archive
Fix: Use rrdtool info to see available RRAs

Performance Issues

# Check RRD file size
ls -lh /var/nmis9/database/nodes/*/health/*.rrd

# Count RRD files
find /var/nmis9/database -name "*.rrd" | wc -l

# Monitor I/O
iotop -o -P -p $(pgrep rrdtool)

# Check for filesystem issues
df -h /var/nmis9/database

Best Practices

Plan Retention: Balance storage space vs historical needs
Monitor Disk Space: RRD files don’t shrink but can accumulate
Backup Strategically: Focus on configuration, RRDs are regenerated
Regular Validation: Check for stale RRD files from deleted nodes
Optimize Polling: Don’t over-poll; 5 minutes is usually sufficient
Use Compression: Modern filesystems with compression help
Archive Historical Data: Export old data before RRD ages it out

Advanced Topics

Custom RRD Definitions

Create custom data sources in model files:

{
  "database": {
    "type": {
      "custom-metric": {
        "rrd": {
          "step": 300,
          "ds": {
            "mymetric": {
              "type": "GAUGE",
              "min": 0,
              "max": 100
            }
          }
        }
      }
    }
  }
}

Data Manipulation

Use CDEF and VDEF for calculations:

# Calculate average rate
DEF:bytes=file.rrd:ds:AVERAGE
CDEF:bits=bytes,8,*
VDEF:avgbits=bits,AVERAGE

Getting Started

Core Features

Configuration

Device Models

Administration

Deployment

​Overview

RRD Storage

Data Resolution

Polling Engine

Graphing

​RRDtool Storage Architecture

​Round Robin Database Concept

​Data Storage Locations

​Data Collection Process

​Polling Cycles

​Data Source Types

​Round Robin Archives (RRAs)

​Data Collection Configuration

​Polling Intervals

​Data Retention

​Retrieving Performance Data

​Using rrdfunc Module

​CLI Data Retrieval

​RRD File Management

​Creating RRD Files

​Updating RRD Files

​RRD File Information

​Resizing RRD Files

​Performance Monitoring

​Common Metrics Collected

​Data Aggregation

​Graphing Performance Data

​Graph Types

​Accessing Graphs

​Data Export and Analysis

​Export Formats

​Statistical Analysis

​Troubleshooting

​Missing Data

​RRD Errors

​Performance Issues

​Best Practices

​Advanced Topics

​Custom RRD Definitions

​Data Manipulation

​Next Steps

Event Management

SNMP Monitoring

Build docs developers (and LLMs) love

Overview

RRDtool Storage Architecture

Round Robin Database Concept

Data Storage Locations

Data Collection Process

Polling Cycles

Data Source Types

Round Robin Archives (RRAs)

Data Collection Configuration

Polling Intervals

Data Retention

Retrieving Performance Data

Using rrdfunc Module

CLI Data Retrieval

RRD File Management

Creating RRD Files

Updating RRD Files

RRD File Information

Resizing RRD Files

Performance Monitoring

Common Metrics Collected

Data Aggregation

Graphing Performance Data

Graph Types

Accessing Graphs

Data Export and Analysis

Export Formats

Statistical Analysis

Troubleshooting

Missing Data

RRD Errors

Performance Issues

Best Practices

Advanced Topics

Custom RRD Definitions

Data Manipulation

Next Steps