Overview
NMIS collects and stores performance metrics as time-series data using RRDtool (Round Robin Database). This provides efficient storage with automatic data aging and aggregation, enabling both real-time monitoring and historical trend analysis.
RRD Storage Efficient circular buffer storage with fixed size and automatic aggregation
Data Resolution Multiple resolution levels from 5-minute to yearly averages
Polling Engine Automated collection cycles with configurable intervals
Graphing Built-in graphing of collected metrics with customizable views
Round Robin Database Concept
RRD files use a circular buffer structure that:
Fixed Size : Never grows beyond configured size
Automatic Aging : Oldest data automatically overwritten
Consolidation : High-resolution data aggregated over time
Efficiency : O(1) read/write operations regardless of data age
# From rrdfunc.pm:29-68
package NMISNG::rrdfunc ;
our $VERSION = "9.5.1" ;
use Statistics::Lite;
use POSIX qw() ; # for strftime
# Initialize RRDs module
sub require_RRDs {
my ( %args ) = @_ ;
state $RRD_included = 0;
if ( ! $RRD_included ) {
$RRD_included = 1;
require RRDs;
RRDs -> import ;
}
}
# Track module errors
my $_last_error;
sub getRRDerror {
return $_last_error;
}
Data Storage Locations
RRD files are organized by node and data type:
# Default RRD storage structure
/var/nmis9/database/
├── nodes/
│ ├── router1/
│ │ ├── health/
│ │ │ ├── cpu.rrd
│ │ │ ├── memory.rrd
│ │ │ └── health.rrd
│ │ ├── interface/
│ │ │ ├── GigabitEthernet0_0-ifInOctets.rrd
│ │ │ ├── GigabitEthernet0_0-ifOutOctets.rrd
│ │ │ └── GigabitEthernet0_1-ifInOctets.rrd
│ │ └── pkts/
│ │ └── GigabitEthernet0_0-pkts.rrd
│ └── switch1/
│ └── ...
Data Collection Process
Polling Cycles
NMIS operates on multiple collection cycles:
Update Cycle (default 5 minutes):
System information (uptime, name, location)
Catchall data (global device metrics)
Interface discovery
Collect Cycle (default 5 minutes):
Interface statistics (octets, errors, discards)
CPU and memory utilization
Environmental sensors
Protocol statistics
Services Cycle (configurable):
Service availability checks
Response time measurements
Polling intervals can be customized per node or group using polling policies. Critical devices may poll every 1-2 minutes, while less important ones every 15-30 minutes.
Data Source Types
RRD databases contain multiple data sources (DS), each with a type:
Value that can increase or decrease (CPU %, temperature, voltage)
Ever-increasing counter that wraps (interface octets, packet counts)
Counter that can increase or decrease (signed counter)
Counter that resets to zero after each read
Round Robin Archives (RRAs)
Each RRD file contains multiple archives with different resolutions:
# From rrdfunc.pm:70-130
sub getRRDasHash {
my %args = @_ ;
my $db = $args { database };
my $minhr = ( defined $args { hour_from }? $args { hour_from } : 0);
my $maxhr = ( defined $args { hour_to }? $args { hour_to } : 24) ;
my $wantedresolution = $args { resolution };
my @rrdargs = ( $db , $args { mode });
my ( $bucketsize , $resolution );
if ( defined ( $wantedresolution ) && $wantedresolution > 0) {
# Determine native resolutions available
my ( $error , @available ) = getRRDResolutions( $db , $args { mode });
return ({},[], { error => $error }) if ( $error );
# Match desired resolution to available RRAs
if ( grep ( $_ == $wantedresolution , @available )) {
$resolution = $wantedresolution ;
}
elsif ( $wantedresolution % $available [0] == 0) {
# Bucketize ourselves
$bucketsize = $wantedresolution / $available [0];
$resolution = $available [0];
}
}
}
Standard NMIS RRA Configuration:
Archive Step Rows Period Consolidation RRA:AVERAGE 5 min 576 2 days Average of raw samples RRA:AVERAGE 30 min 672 2 weeks Average of 6 samples RRA:AVERAGE 2 hours 744 2 months Average of 24 samples RRA:AVERAGE 1 day 1460 4 years Average of 288 samples
Data Collection Configuration
Polling Intervals
# Configure node polling interval
nmis-cli act=set node=router1 poll_interval= 300
# Set faster polling for critical device
nmis-cli act=set node=core-switch1 poll_interval= 60
# Slower polling for edge devices
nmis-cli act=set node=remote-router poll_interval= 900
Data Retention
Retention is controlled by RRA configuration in model files:
{
"database" : {
"type" : {
"ifInOctets" : {
"rrd" : {
"step" : 300 ,
"heartbeat" : 900 ,
"ds" : {
"ifInOctets" : {
"type" : "COUNTER" ,
"min" : "0" ,
"max" : "U"
}
},
"rra" : [
{ "cf" : "AVERAGE" , "xff" : 0.5 , "steps" : 1 , "rows" : 576 },
{ "cf" : "AVERAGE" , "xff" : 0.5 , "steps" : 6 , "rows" : 672 },
{ "cf" : "AVERAGE" , "xff" : 0.5 , "steps" : 24 , "rows" : 744 },
{ "cf" : "AVERAGE" , "xff" : 0.5 , "steps" : 288 , "rows" : 1460 }
]
}
}
}
}
}
RRA Parameters:
CF : Consolidation Function (AVERAGE, MIN, MAX, LAST)
XFF : X-Files Factor (0.5 = max 50% unknown data)
Steps : Number of intervals to consolidate
Rows : Number of consolidated rows to keep
Using rrdfunc Module
# From rrdfunc.pm:70-200
sub getRRDasHash {
my %args = @_ ;
my $db = $args { database };
return ({},[], { error => "database required!" })
if (! $db or ! -f $db );
my @rrdargs = ( $db , $args { mode });
push @rrdargs , ( "--start" , $args { start }, "--end" , $args { end });
my ( $begin , $step , $name , $data ) = RRDs::fetch( @rrdargs );
my @dsnames = @$name if ( defined $name );
my %s ;
my $time = $begin ;
my $rowswithdata ;
# Loop over readings over time
for ( my $row = 0; $row <= $# { $data }; ++ $row , $time += $step ) {
my $thisrow = $data -> [ $row ];
my $datapresent ;
# Loop over datasets per reading
for ( my $dsidx = 0; $dsidx <= $# { $thisrow }; ++ $dsidx ) {
$s { $time } -> { $dsnames [ $dsidx ] } = $thisrow -> [ $dsidx ];
$datapresent ||= 1 if ( defined $thisrow -> [ $dsidx ]);
}
++ $rowswithdata if ( $datapresent );
}
return (\ %s , \ @dsnames , {
step => $step ,
start => $begin ,
end => $time ,
rows => scalar @$data ,
rows_with_data => $rowswithdata
});
}
CLI Data Retrieval
# Get latest values
nmis-cli act=get-rrd node=router1 inventory=cpu
# Get data for time range
nmis-cli act=get-rrd node=router1 \
inventory=interface \
index=GigabitEthernet0/0 \
start="-1day" \
end="now"
# Export to CSV
nmis-cli act=export-rrd node=router1 \
inventory=cpu \
start="-1week" \
format=csv > cpu-data.csv
RRD File Management
Creating RRD Files
RRD files are automatically created during first data collection based on model definitions. Manual creation:
# Create RRD from command line
rrdtool create /path/to/file.rrd \
--step 300 \
DS:metric:GAUGE:900:0:U \
RRA:AVERAGE:0.5:1:576 \
RRA:AVERAGE:0.5:6:672 \
RRA:AVERAGE:0.5:24:744 \
RRA:AVERAGE:0.5:288:1460
Updating RRD Files
# Manual RRD update
rrdtool update /path/to/file.rrd N:value
# Update with specific timestamp
rrdtool update /path/to/file.rrd 1234567890:42
# View RRD structure
rrdtool info /var/nmis9/database/nodes/router1/health/cpu.rrd
# Check last update time
rrdtool lastupdate /var/nmis9/database/nodes/router1/health/cpu.rrd
# Dump RRD data to XML
rrdtool dump /path/to/file.rrd > file.xml
# Restore from XML
rrdtool restore file.xml /path/to/file.rrd
Resizing RRD Files
Change retention periods by resizing RRAs:
# Use NMIS resize script
/usr/local/nmis9/admin/rrd_resize.pl \
node=router1 \
type=health \
file=cpu.rrd \
archive= 2 \
rows= 1488
# Or use rrdtool directly
rrdtool resize /path/to/file.rrd 2 GROW 100
rrdtool resize /path/to/file.rrd 2 SHRINK 100
Common Metrics Collected
Interface Metrics:
ifInOctets / ifOutOctets (traffic volume)
ifInUcastPkts / ifOutUcastPkts (packets)
ifInErrors / ifOutErrors
ifInDiscards / ifOutDiscards
System Health:
avgBusy (CPU utilization)
MemoryUsedPROC / MemoryFreePROC
bufferFail, bufferSwap (buffer statistics)
Environmental:
Temperature sensors
Fan speeds
Power supply status
Voltage levels
Application:
CBQoS (Cisco class-based QoS)
IP SLA metrics
VPN statistics
Call Manager metrics
Data Aggregation
NMIS automatically aggregates high-resolution data:
Raw (5 min) --> 30 min avg --> 2 hour avg --> 1 day avg
576 rows 672 rows 744 rows 1460 rows
(2 days) (2 weeks) (2 months) (4 years)
Aggregation functions:
AVERAGE : Mean of values in period
MIN : Minimum value in period
MAX : Maximum value in period
LAST : Most recent value in period
Graph Types
Interface Graphs:
Traffic (bits/octets per second)
Packets per second
Errors and discards
Utilization percentage
Health Graphs:
CPU utilization over time
Memory usage trends
Buffer statistics
Response time
Custom Graphs:
Defined in model files
Support for VDEF, CDEF operations
Multiple data sources
Stacked and overlaid views
Accessing Graphs
# Generate graph via CLI
nmis-cli act=graph node=router1 \
graph=cpu \
start="-1day" \
end="now" \
output=/tmp/cpu-graph.png
# Generate multiple graphs
nmis-cli act=graph-all node=router1 \
start="-1week" \
outputdir=/tmp/graphs/
Data Export and Analysis
# Export to CSV
rrdtool xport \
--start -1day \
--end now \
--step 300 \
DEF:metric=/path/to/file.rrd:ds:AVERAGE \
XPORT:metric:"Metric Name" \
> data.csv
# Export using NMIS
nmis-cli act=export-data node=router1 \
type=interface \
format=csv \
start="-1month" \
> interface-data.csv
Statistical Analysis
# Get statistics from RRD
rrdtool graph /dev/null \
--start -1day --end now \
DEF:metric=/path/to/file.rrd:ds:AVERAGE \
VDEF:avg=metric,AVERAGE \
VDEF:min=metric,MINIMUM \
VDEF:max=metric,MAXIMUM \
PRINT:avg:"Average: %6.2lf" \
PRINT:min:"Minimum: %6.2lf" \
PRINT:max:"Maximum: %6.2lf"
Troubleshooting
Missing Data
# Check last update
rrdtool lastupdate /path/to/file.rrd
# Verify file integrity
rrdtool info /path/to/file.rrd | grep last_update
# Check for unknown data
rrdtool fetch /path/to/file.rrd AVERAGE -s -1hour | grep nan
RRD Errors
“illegal attempt to update using time X”
Trying to update with timestamp older than last update
Fix: Ensure system time is correct
“expected X data source readings”
Wrong number of values provided
Fix: Match update to DS count in RRD
“unknown RRA”
Requesting non-existent archive
Fix: Use rrdtool info to see available RRAs
# Check RRD file size
ls -lh /var/nmis9/database/nodes/ * /health/ * .rrd
# Count RRD files
find /var/nmis9/database -name "*.rrd" | wc -l
# Monitor I/O
iotop -o -P -p $( pgrep rrdtool )
# Check for filesystem issues
df -h /var/nmis9/database
Best Practices
Plan Retention : Balance storage space vs historical needs
Monitor Disk Space : RRD files don’t shrink but can accumulate
Backup Strategically : Focus on configuration, RRDs are regenerated
Regular Validation : Check for stale RRD files from deleted nodes
Optimize Polling : Don’t over-poll; 5 minutes is usually sufficient
Use Compression : Modern filesystems with compression help
Archive Historical Data : Export old data before RRD ages it out
Advanced Topics
Custom RRD Definitions
Create custom data sources in model files:
{
"database" : {
"type" : {
"custom-metric" : {
"rrd" : {
"step" : 300 ,
"ds" : {
"mymetric" : {
"type" : "GAUGE" ,
"min" : 0 ,
"max" : 100
}
}
}
}
}
}
}
Data Manipulation
Use CDEF and VDEF for calculations:
# Calculate average rate
DEF:bytes =file.rrd:ds:AVERAGE
CDEF:bits =bytes,8,*
VDEF:avgbits =bits,AVERAGE
Next Steps
Event Management Configure thresholds and alerts based on performance data
SNMP Monitoring Understand how data is collected via SNMP