Event Management

Overview

NMIS provides a comprehensive event management system that detects problems, tracks their lifecycle, escalates based on severity and duration, and delivers notifications through multiple channels. The event system is the core of NMIS’s alerting and notification capabilities.

Event Detection

Automatic detection of down, degraded, and threshold violations

Escalation

Multi-level escalation based on severity and event duration

Notifications

Email, syslog, SNMP traps, and custom notification methods

Event Lifecycle

Track events from creation through acknowledgment to closure

Event Object Model

Events are first-class objects in NMIS with defined properties and lifecycle:

# From Event.pm:30-83
package NMISNG::Event;

our $VERSION = "9.6.5";

# Known event attributes
my %known_attrs = (
    _id            => 1,
    ack            => 1,
    active         => 1,
    cluster_id     => 1,
    context        => 1,
    details        => 1,
    element        => 1,
    escalate       => 1,
    event_previous => 1,
    expire_at      => 1,
    historic       => 1,
    inventory_id   => 1,
    lastupdate     => 1,
    level_previous => 1,
    logged         => 1,
    node_name      => 1,
    node_uuid      => 1,
    notify         => 1,
    startdate      => 1,
    stateless      => 1,
    user           => 1,
    configuration  => {"group" => 1}
);

sub new {
    my ( $class, %args ) = @_;
    confess "nmisng required" if ( ref( $args{nmisng} ) ne "NMISNG" );
    
    # Need enough data to find in DB
    if ( !$args{_id} && ( !$args{node_uuid} && !$args{event} ) ) {
        confess "not enough info to create event";
    }
    
    my $self = bless({
        _nmisng => $args{nmisng},
        data    => \%args
    }, $class);
    
    return $self;
}

Event Properties

event

string

required

Event name (e.g., “Node Down”, “Interface Down”, “Proactive CPU”)

node_name

string

required

Name of affected node

node_uuid

string

required

UUID of affected node

level

string

required

Event severity: Fatal, Critical, Major, Minor, Warning, Normal

element

string

Affected element (interface name, disk, process, etc.)

details

string

Additional event information and context

active

boolean

default:"true"

Whether event is currently active (not resolved)

ack

boolean

default:"false"

Whether event has been acknowledged by operator

escalate

number

default:"-1"

Current escalation level (-1 = not escalated)

stateless

boolean

default:"false"

Event resets after dampening period (for traps, alerts)

Event Types

System Events

Node Availability:

Node Down: Device unreachable via ping
Node Up: Device becomes reachable
SNMP Down: SNMP polling failed
WMI Down: WMI collection failed
Node Polling Failover: Failover to backup host

Interface Events:

Interface Down: Interface operStatus changed to down
Interface Up: Interface recovered

Service Events:

Service Down: Service check failed
Service Up: Service recovered
Service Degraded: Service slow but responding

Threshold Events (Proactive)

Generated when metrics exceed defined thresholds:

Proactive CPU: CPU utilization threshold exceeded
Proactive Memory: Memory usage threshold exceeded
Proactive Interface: Interface utilization threshold
Proactive Disk: Disk space threshold
Proactive [Custom]: Custom threshold violations

# From Event.pm:368-396
if ( $self->event =~ /Proactive/ ) {
    my ( $value, $reset ) = @args{"value", "reset"};
    if ( defined $value and defined $reset ) {
        # Only clear if threshold cleared by 10%
        # For thresholds where high = good (default 1.1)
        # For thresholds where low = good (default 0.9)
        my $cutoff = $reset * (
              $value >= $reset
            ? $C->{'threshold_falling_reset_dampening'}
            : $C->{'threshold_rising_reset_dampening'}
        );
        
        if ( $value >= $reset && $value <= $cutoff ) {
            $S->nmisng->log->debug(
                "Proactive Event value $value too low for dampening limit $cutoff. Not closing.");
            return;
        }
    }
    $new_event = $self->event . " Closed";
}

Alert Events

Custom alerts generated by external systems or scripts:

Alert: [Custom]: Custom alert conditions
Alert names prefixed with “Alert:” or “Alert :“

SNMP Trap Events

Generated from received SNMP traps:

TRAP: Generic SNMP trap event
Traps are typically stateless and auto-acknowledge when cleared

Event Lifecycle

Event States

Events progress through defined states:

Created: Event detected and added to database
Active: Event is current and unresolved (active=1, historic=0)
Acknowledged: Operator acknowledged (ack=1)
Resolved: Condition cleared (active=0, historic=0)
Historic: Event archived for history (historic=1)

# From Event.pm:282-326
sub check {
    my ( $self, %args ) = @_;
    my $S = $args{sys};
    
    my $exists = $self->exists();
    
    # Event exists and is active - create UP event
    if ( $exists && $self->active ) {
        # Compute outage period
        my $outage = NMISNG::Util::convertSecsHours( time() - $self->startdate );
        
        # Determine UP event name
        if ( $self->event eq "Node Down" ) {
            $new_event = "Node Up";
        }
        elsif ( $self->event eq "Interface Down" ) {
            $new_event = "Interface Up";
        }
        elsif ( $self->event =~ /Proactive/ ) {
            $new_event = $self->event . " Closed";
        }
        elsif ( $self->event =~ /down/i ) {
            $new_event =~ s/down/Up/i;
        }
        
        $details .= ( $details ? " " : "" ) . "Time=$outage";
        
        # Mark event inactive, notify, then delete
        $self->active(0);
        $self->event($new_event);
        $self->details($details);
        $self->level($level);
        
        my $error = $self->save();
    }
}

State Transitions

Event Creation and Management

Creating Events

# From Events.pm:125-153
sub eventAdd {
    my ( $self, %args ) = @_;
    
    my $node = $args{node};
    return "Cannot create event without node object" 
        if ( ref($node) ne 'NMISNG::Node' );
        
    $args{node_name} = $node->name;
    $args{node_uuid} = $node->uuid;
    
    my $event_obj = $self->event(%args);
    return $event_obj->save();
}

Using Events API

# Create event via node object
$node->eventAdd(
    event   => "Proactive CPU",
    level   => "Major",
    element => "cpu",
    details => "CPU utilization 95% exceeds threshold 90%"
);

# Check if event exists
if ($node->eventExist("Node Down")) {
    # Node is down
}

# Load specific event
my ($error, $event) = $node->eventLoad(
    event   => "Interface Down",
    element => "GigabitEthernet0/0"
);

CLI Event Operations

# List active events
nmis-cli act=list-events

# List events for node
nmis-cli act=list-events node=router1

# Show event details
nmis-cli act=show-event id=507f1f77bcf86cd799439011

# Acknowledge event
nmis-cli act=ack-event id=507f1f77bcf86cd799439011 \
  user="admin" \
  ack=true

# Clear event manually
nmis-cli act=clear-event node=router1 event="Node Down"

Event Escalation

Escalation Levels

Events escalate based on duration and severity:

# From Event.pm:897-948
sub save {
    my ( $self,  %args )  = @_;
    
    # Set defaults for new events
    if ( !$update ) {
        $self->{data}{active}    //= 1;
        $self->{data}{historic}  //= 0;
        $self->{data}{startdate} //= time;
        $self->{data}{ack}       //= 0;
        $self->{data}{escalate}  //= -1;
        $self->{data}{notify}    //= "";
        $self->{data}{stateless} //= 0;
    }
}

Escalation Levels:

-1: Not yet escalated
0: Initial escalation
1-N: Subsequent escalation levels

Escalation Process:

Event created with escalate=-1
First escalation after initial delay: escalate=0
Subsequent escalations: escalate++
Different contacts notified at each level
Stops at maximum configured level

Escalation Configuration

Configure in escalation policies:

{
  "escalate": {
    "0": {
      "level": "Fatal,Critical,Major",
      "start": 5,
      "frequency": 60,
      "contact": "admin"
    },
    "1": {
      "level": "Fatal,Critical",
      "start": 30,
      "frequency": 120,
      "contact": "oncall"
    },
    "2": {
      "level": "Fatal",
      "start": 120,
      "frequency": 240,
      "contact": "manager"
    }
  }
}

Parameters:

level: Event severity levels this escalation applies to
start: Minutes after event creation before first notification
frequency: Minutes between repeat notifications
contact: Contact group to notify

Event Acknowledgment

# From Event.pm:206-280
sub acknowledge {
    my ( $self, %args ) = @_;
    my $ack  = $args{ack};
    my $user = $args{user};
    
    $ack = NMISNG::Util::getbool($ack);
    
    # Load current event state
    if ( my $error = $self->load() ) {
        $self->nmisng->log->error( "cannot find event id:" . $self->_id );
        return "cannot find event id:" . $self->_id;
    }
    return if ( !$self->active );
    
    # TRAP events are deleted when acknowledged
    if ( $ack and !$self->ack and $self->event eq "TRAP" ) {
        if ( my $error = $self->delete() ) {
            $self->nmisng->log->error("failed to delete event: $error");
        }
        $self->log(
            event => "deleted event: " . $self->event,
            level => "Normal",
        ) if ($wantlog);
    }
    else {
        # Nothing to do if ack state unchanged
        if ( $ack != $self->ack ) {
            $self->ack($ack);
            $self->user($user);
            my $error = $self->save( update => 1 );
            
            $self->log(
                level   => "Normal",
                details => "acknowledge=$ack ($user)"
            ) if $wantlog;
        }
    }
}

Acknowledgment via CLI

# Acknowledge single event
nmis-cli act=ack-event node=router1 \
  event="Node Down" \
  user="john.doe" \
  ack=true

# Acknowledge by event ID
nmis-cli act=ack-event id=507f1f77bcf86cd799439011 \
  user="admin" \
  ack=true

# Unacknowledge (re-enable notifications)
nmis-cli act=ack-event node=router1 \
  event="Interface Down" \
  element="GigabitEthernet0/0" \
  user="admin" \
  ack=false

# Acknowledge all events for node
nmis-cli act=ack-all-events node=router1 user="admin"

Event Logging

Event Log File

All events are logged to the event log file:

# From Events.pm:288-337
sub logEvent {
    my ( $self, %args ) = @_;
    
    my $node_name = $args{node_name};
    my $event     = $args{event};
    my $element   = $args{element};
    my $level     = $args{level};
    my $details   = $args{details};
    $details =~ s/,//g;  # Strip commas
    
    if ( !$node_name or !$event or !$level ) {
        return "cannot log event, required argument missing";
    }
    
    my $time = time();
    my $C    = NMISNG::Util::loadConfTable();
    
    sysopen( DATAFILE, "$C->{event_log}", O_WRONLY | O_APPEND | O_CREAT )
        or push( @problems, "Cannot open $C->{event_log}: $!" );
    flock( DATAFILE, LOCK_EX )
        or push( @problems, "Cannot lock $C->{event_log}: $!" );
    
    my $message = "$time,$node_name,$event,$level,$element,$details\n";
    print DATAFILE $message;
    close(DATAFILE);
}

Event Log Location:

Default: /usr/local/nmis9/logs/event.log
Format: CSV with timestamp, node, event, level, element, details

Viewing Event Logs

# Tail event log
tail -f /usr/local/nmis9/logs/event.log

# Search for specific events
grep "Node Down" /usr/local/nmis9/logs/event.log

# Events in last hour
awk -v since=$(date -d '1 hour ago' +%s) '$1 > since' \
  /usr/local/nmis9/logs/event.log

# Count events by type
cut -d, -f3 /usr/local/nmis9/logs/event.log | sort | uniq -c

Notifications

Notification Methods

NMIS supports multiple notification channels: Email Notifications:

HTML or plain text
Customizable templates
Multiple recipients per contact
Attachment support

Syslog:

RFC 3164/5424 format
Configurable facility and severity
Remote syslog servers

SNMP Traps:

SNMPv2c traps
Custom trap definitions
Includes event details as varbinds

Custom Scripts:

Execute external programs
Pass event data as arguments
Integrate with ticketing systems

Notification Configuration

Configure in Contacts.nmis:

{
  "admin": {
    "email": "[email protected]",
    "mobile": "+1-555-0100",
    "notify": {
      "node_down": true,
      "interface_down": true,
      "proactive": true
    },
    "escalate": {
      "0": true,
      "1": true
    }
  }
}

Event Filtering and Control

Events.nmis Configuration

Control which events are active, logged, and notified:

{
  "Node Down": {
    "Status": "true",
    "Log": "true",
    "Notify": "true"
  },
  "Interface Down": {
    "Status": "true",
    "Log": "true",
    "Notify": "true",
    "Filter": "ifAdminStatus eq 'up'"
  },
  "Proactive CPU": {
    "Status": "true",
    "Log": "true",
    "Notify": "true"
  }
}

Event Status Levels

Severity Mapping:

# From Event.pm:545-619
sub getLogLevel {
    my ( $self, %args ) = @_;
    my ( $S, $event, $level ) = @args{'sys', 'event', 'level'};
    
    my $role = $node->configuration->{roleType} || 'access';
    my $type = $node->configuration->{nodeType} || 'router';
    my $default_event_level = $self->nmisng->config->{default_event_level} // 'Major';
    
    # Get level from model
    if ( $mdl_level = $M->{event}{event}{lc $pol_event}{lc $role}{level} ) {
        $log    = $M->{event}{event}{lc $pol_event}{lc $role}{logging};
        $syslog = $M->{event}{event}{lc $pol_event}{lc $role}{syslog};
    }
    elsif ( $mdl_level = $M->{event}{event}{default}{lc $role}{level} ) {
        $log    = $M->{event}{event}{default}{lc $role}{logging};
        $syslog = $M->{event}{event}{default}{lc $role}{syslog};
    }
    else {
        $mdl_level = $default_event_level;
    }
    
    return ( $level, $log, $syslog );
}

Standard Levels:

Fatal: System failure, immediate action required
Critical: Critical condition, escalate immediately
Major: Major problem affecting service
Minor: Minor issue, may not need immediate action
Warning: Warning condition, informational
Normal: Normal operation, UP events

Stateless Events

Some events are stateless and reset automatically:

# From Event.pm:897-919
if ( $exists && $self->stateless ) {
    my $stateless_event_dampening = $self->nmisng->config->{stateless_event_dampening} || 900;
    
    # If stateless time exceeds dampening, reset escalation
    if ( time() > $self->startdate + $stateless_event_dampening ) {
        $self->active(1);
        $self->historic(0);
        $self->startdate(time);
        $self->escalate(-1);
        $self->ack(0);
    }
}

Stateless Event Types:

SNMP Traps
Custom Alerts
Some threshold events

Dampening Period:

Default: 900 seconds (15 minutes)
Configurable via stateless_event_dampening
Prevents notification flooding

Troubleshooting

Events Not Created

# Check event configuration
cat /usr/local/nmis9/conf/Events.nmis | grep "Node Down"

# Verify node is active
nmis-cli act=show node=router1 property=active

# Check polling logs
tail -f /usr/local/nmis9/logs/nmis.log | grep Event

# Force update to trigger event check
nmis-cli act=update node=router1

Notifications Not Sent

# Check contact configuration
cat /usr/local/nmis9/conf/Contacts.nmis

# Verify email settings
nmis-cli act=test-email [email protected]

# Check escalation log
tail -f /usr/local/nmis9/logs/escalate.log

# Test notification manually
nmis-cli act=test-notify contact=admin event="Test Event"

Event Not Clearing

# Check if condition actually cleared
nmis-cli act=update node=router1

# View event details
nmis-cli act=show-event node=router1 event="Node Down"

# Manually clear if needed
nmis-cli act=clear-event node=router1 event="Node Down"

Best Practices

Configure Escalations: Set appropriate escalation levels and timing
Use Acknowledgment: Acknowledge events to suppress notifications
Review Event Logs: Regularly review event.log for patterns
Tune Thresholds: Adjust to reduce false positives
Document Events: Use details field for troubleshooting context
Test Notifications: Regularly test notification delivery
Monitor Event Count: Excessive events may indicate configuration issues

Getting Started

Core Features

Configuration

Device Models

Administration

Deployment

​Overview

Event Detection

Escalation

Notifications

Event Lifecycle

​Event Object Model

​Event Properties

​Event Types

​System Events

​Threshold Events (Proactive)

​Alert Events

​SNMP Trap Events

​Event Lifecycle

​Event States

​State Transitions

​Event Creation and Management

​Creating Events

​Using Events API

​CLI Event Operations

​Event Escalation

​Escalation Levels

​Escalation Configuration

​Event Acknowledgment

​Acknowledgment via CLI

​Event Logging

​Event Log File

​Viewing Event Logs

​Notifications

​Notification Methods

​Notification Configuration

​Event Filtering and Control

​Events.nmis Configuration

​Event Status Levels

​Stateless Events

​Troubleshooting

​Events Not Created

​Notifications Not Sent

​Event Not Clearing

​Best Practices

​Next Steps

Performance Data

Device Management

Build docs developers (and LLMs) love

Overview

Event Object Model

Event Properties

Event Types

System Events

Threshold Events (Proactive)

Alert Events

SNMP Trap Events

Event Lifecycle

Event States

State Transitions

Event Creation and Management

Creating Events

Using Events API

CLI Event Operations

Event Escalation

Escalation Levels

Escalation Configuration

Event Acknowledgment

Acknowledgment via CLI

Event Logging

Event Log File

Viewing Event Logs

Notifications

Notification Methods

Notification Configuration

Event Filtering and Control

Events.nmis Configuration

Event Status Levels

Stateless Events

Troubleshooting

Events Not Created

Notifications Not Sent

Event Not Clearing

Best Practices

Next Steps