Overview
NMIS provides a comprehensive event management system that detects problems, tracks their lifecycle, escalates based on severity and duration, and delivers notifications through multiple channels. The event system is the core of NMIS’s alerting and notification capabilities.
Event Detection Automatic detection of down, degraded, and threshold violations
Escalation Multi-level escalation based on severity and event duration
Notifications Email, syslog, SNMP traps, and custom notification methods
Event Lifecycle Track events from creation through acknowledgment to closure
Event Object Model
Events are first-class objects in NMIS with defined properties and lifecycle:
# From Event.pm:30-83
package NMISNG::Event ;
our $VERSION = "9.6.5" ;
# Known event attributes
my %known_attrs = (
_id => 1,
ack => 1,
active => 1,
cluster_id => 1,
context => 1,
details => 1,
element => 1,
escalate => 1,
event_previous => 1,
expire_at => 1,
historic => 1,
inventory_id => 1,
lastupdate => 1,
level_previous => 1,
logged => 1,
node_name => 1,
node_uuid => 1,
notify => 1,
startdate => 1,
stateless => 1,
user => 1,
configuration => { "group" => 1}
);
sub new {
my ( $class , %args ) = @_ ;
confess "nmisng required" if ( ref ( $args { nmisng } ) ne "NMISNG" );
# Need enough data to find in DB
if ( ! $args { _id } && ( ! $args { node_uuid } && ! $args { event } ) ) {
confess "not enough info to create event" ;
}
my $self = bless ({
_nmisng => $args { nmisng },
data => \ %args
}, $class );
return $self ;
}
Event Properties
Event name (e.g., “Node Down”, “Interface Down”, “Proactive CPU”)
Event severity: Fatal, Critical, Major, Minor, Warning, Normal
Affected element (interface name, disk, process, etc.)
Additional event information and context
Whether event is currently active (not resolved)
Whether event has been acknowledged by operator
Current escalation level (-1 = not escalated)
Event resets after dampening period (for traps, alerts)
Event Types
System Events
Node Availability:
Node Down : Device unreachable via ping
Node Up : Device becomes reachable
SNMP Down : SNMP polling failed
WMI Down : WMI collection failed
Node Polling Failover : Failover to backup host
Interface Events:
Interface Down : Interface operStatus changed to down
Interface Up : Interface recovered
Service Events:
Service Down : Service check failed
Service Up : Service recovered
Service Degraded : Service slow but responding
Threshold Events (Proactive)
Generated when metrics exceed defined thresholds:
Proactive CPU : CPU utilization threshold exceeded
Proactive Memory : Memory usage threshold exceeded
Proactive Interface : Interface utilization threshold
Proactive Disk : Disk space threshold
Proactive [Custom] : Custom threshold violations
# From Event.pm:368-396
if ( $self -> event =~ /Proactive/ ) {
my ( $value , $reset ) = @args { "value" , "reset" };
if ( defined $value and defined $reset ) {
# Only clear if threshold cleared by 10%
# For thresholds where high = good (default 1.1)
# For thresholds where low = good (default 0.9)
my $cutoff = $reset * (
$value >= $reset
? $C -> { 'threshold_falling_reset_dampening' }
: $C -> { 'threshold_rising_reset_dampening' }
);
if ( $value >= $reset && $value <= $cutoff ) {
$S -> nmisng -> log -> debug(
"Proactive Event value $value too low for dampening limit $cutoff . Not closing." );
return ;
}
}
$new_event = $self -> event . " Closed" ;
}
Alert Events
Custom alerts generated by external systems or scripts:
Alert: [Custom] : Custom alert conditions
Alert names prefixed with “Alert:” or “Alert :“
SNMP Trap Events
Generated from received SNMP traps:
TRAP : Generic SNMP trap event
Traps are typically stateless and auto-acknowledge when cleared
Event Lifecycle
Event States
Events progress through defined states:
Created : Event detected and added to database
Active : Event is current and unresolved (active=1, historic=0)
Acknowledged : Operator acknowledged (ack=1)
Resolved : Condition cleared (active=0, historic=0)
Historic : Event archived for history (historic=1)
# From Event.pm:282-326
sub check {
my ( $self , %args ) = @_ ;
my $S = $args { sys };
my $exists = $self -> exists ();
# Event exists and is active - create UP event
if ( $exists && $self -> active ) {
# Compute outage period
my $outage = NMISNG::Util::convertSecsHours( time () - $self -> startdate );
# Determine UP event name
if ( $self -> event eq "Node Down" ) {
$new_event = "Node Up" ;
}
elsif ( $self -> event eq "Interface Down" ) {
$new_event = "Interface Up" ;
}
elsif ( $self -> event =~ /Proactive/ ) {
$new_event = $self -> event . " Closed" ;
}
elsif ( $self -> event =~ /down/ i ) {
$new_event =~ s /down/Up/ i ;
}
$details .= ( $details ? " " : "" ) . "Time= $outage " ;
# Mark event inactive, notify, then delete
$self -> active(0);
$self -> event( $new_event );
$self -> details( $details );
$self -> level( $level );
my $error = $self -> save();
}
}
State Transitions
Event Creation and Management
Creating Events
# From Events.pm:125-153
sub eventAdd {
my ( $self , %args ) = @_ ;
my $node = $args { node };
return "Cannot create event without node object"
if ( ref ( $node ) ne 'NMISNG::Node' );
$args { node_name } = $node -> name;
$args { node_uuid } = $node -> uuid;
my $event_obj = $self -> event( %args );
return $event_obj -> save();
}
Using Events API
# Create event via node object
$node -> eventAdd(
event => "Proactive CPU" ,
level => "Major" ,
element => "cpu" ,
details => "CPU utilization 95% exceeds threshold 90%"
);
# Check if event exists
if ( $node -> eventExist( "Node Down" )) {
# Node is down
}
# Load specific event
my ( $error , $event ) = $node -> eventLoad(
event => "Interface Down" ,
element => "GigabitEthernet0/0"
);
CLI Event Operations
# List active events
nmis-cli act=list-events
# List events for node
nmis-cli act=list-events node=router1
# Show event details
nmis-cli act=show-event id=507f1f77bcf86cd799439011
# Acknowledge event
nmis-cli act=ack-event id=507f1f77bcf86cd799439011 \
user="admin" \
ack= true
# Clear event manually
nmis-cli act=clear-event node=router1 event="Node Down"
Event Escalation
Escalation Levels
Events escalate based on duration and severity:
# From Event.pm:897-948
sub save {
my ( $self , %args ) = @_ ;
# Set defaults for new events
if ( ! $update ) {
$self -> { data }{ active } //= 1;
$self -> { data }{ historic } //= 0;
$self -> { data }{ startdate } //= time ;
$self -> { data }{ ack } //= 0;
$self -> { data }{ escalate } //= -1;
$self -> { data }{ notify } //= "" ;
$self -> { data }{ stateless } //= 0;
}
}
Escalation Levels:
-1 : Not yet escalated
0 : Initial escalation
1-N : Subsequent escalation levels
Escalation Process:
Event created with escalate=-1
First escalation after initial delay: escalate=0
Subsequent escalations: escalate++
Different contacts notified at each level
Stops at maximum configured level
Escalation Configuration
Configure in escalation policies:
{
"escalate" : {
"0" : {
"level" : "Fatal,Critical,Major" ,
"start" : 5 ,
"frequency" : 60 ,
"contact" : "admin"
},
"1" : {
"level" : "Fatal,Critical" ,
"start" : 30 ,
"frequency" : 120 ,
"contact" : "oncall"
},
"2" : {
"level" : "Fatal" ,
"start" : 120 ,
"frequency" : 240 ,
"contact" : "manager"
}
}
}
Parameters:
level : Event severity levels this escalation applies to
start : Minutes after event creation before first notification
frequency : Minutes between repeat notifications
contact : Contact group to notify
Event Acknowledgment
# From Event.pm:206-280
sub acknowledge {
my ( $self , %args ) = @_ ;
my $ack = $args { ack };
my $user = $args { user };
$ack = NMISNG::Util::getbool( $ack );
# Load current event state
if ( my $error = $self -> load() ) {
$self -> nmisng -> log -> error( "cannot find event id:" . $self -> _id );
return "cannot find event id:" . $self -> _id;
}
return if ( ! $self -> active );
# TRAP events are deleted when acknowledged
if ( $ack and ! $self -> ack and $self -> event eq "TRAP" ) {
if ( my $error = $self -> delete () ) {
$self -> nmisng -> log -> error( "failed to delete event: $error " );
}
$self -> log (
event => "deleted event: " . $self -> event,
level => "Normal" ,
) if ( $wantlog );
}
else {
# Nothing to do if ack state unchanged
if ( $ack != $self -> ack ) {
$self -> ack( $ack );
$self -> user( $user );
my $error = $self -> save( update => 1 );
$self -> log (
level => "Normal" ,
details => "acknowledge= $ack ( $user )"
) if $wantlog ;
}
}
}
Acknowledgment via CLI
# Acknowledge single event
nmis-cli act=ack-event node=router1 \
event="Node Down" \
user="john.doe" \
ack= true
# Acknowledge by event ID
nmis-cli act=ack-event id=507f1f77bcf86cd799439011 \
user="admin" \
ack= true
# Unacknowledge (re-enable notifications)
nmis-cli act=ack-event node=router1 \
event="Interface Down" \
element="GigabitEthernet0/0" \
user="admin" \
ack= false
# Acknowledge all events for node
nmis-cli act=ack-all-events node=router1 user="admin"
Event Logging
Event Log File
All events are logged to the event log file:
# From Events.pm:288-337
sub logEvent {
my ( $self , %args ) = @_ ;
my $node_name = $args { node_name };
my $event = $args { event };
my $element = $args { element };
my $level = $args { level };
my $details = $args { details };
$details =~ s /,// g ; # Strip commas
if ( ! $node_name or ! $event or ! $level ) {
return "cannot log event, required argument missing" ;
}
my $time = time ();
my $C = NMISNG::Util::loadConfTable();
sysopen ( DATAFILE, " $C ->{event_log}" , O_WRONLY | O_APPEND | O_CREAT )
or push ( @problems , "Cannot open $C ->{event_log}: $! " );
flock ( DATAFILE, LOCK_EX )
or push ( @problems , "Cannot lock $C ->{event_log}: $! " );
my $message = " $time , $node_name , $event , $level , $element , $details \n " ;
print DATAFILE $message ;
close (DATAFILE);
}
Event Log Location:
Default: /usr/local/nmis9/logs/event.log
Format: CSV with timestamp, node, event, level, element, details
Viewing Event Logs
# Tail event log
tail -f /usr/local/nmis9/logs/event.log
# Search for specific events
grep "Node Down" /usr/local/nmis9/logs/event.log
# Events in last hour
awk -v since= $( date -d '1 hour ago' +%s ) '$1 > since' \
/usr/local/nmis9/logs/event.log
# Count events by type
cut -d, -f3 /usr/local/nmis9/logs/event.log | sort | uniq -c
Notifications
Notification Methods
NMIS supports multiple notification channels:
Email Notifications:
HTML or plain text
Customizable templates
Multiple recipients per contact
Attachment support
Syslog:
RFC 3164/5424 format
Configurable facility and severity
Remote syslog servers
SNMP Traps:
SNMPv2c traps
Custom trap definitions
Includes event details as varbinds
Custom Scripts:
Execute external programs
Pass event data as arguments
Integrate with ticketing systems
Notification Configuration
Configure in Contacts.nmis:
{
"admin" : {
"email" : "[email protected] " ,
"mobile" : "+1-555-0100" ,
"notify" : {
"node_down" : true ,
"interface_down" : true ,
"proactive" : true
},
"escalate" : {
"0" : true ,
"1" : true
}
}
}
Event Filtering and Control
Events.nmis Configuration
Control which events are active, logged, and notified:
{
"Node Down" : {
"Status" : "true" ,
"Log" : "true" ,
"Notify" : "true"
},
"Interface Down" : {
"Status" : "true" ,
"Log" : "true" ,
"Notify" : "true" ,
"Filter" : "ifAdminStatus eq 'up'"
},
"Proactive CPU" : {
"Status" : "true" ,
"Log" : "true" ,
"Notify" : "true"
}
}
Event Status Levels
Severity Mapping:
# From Event.pm:545-619
sub getLogLevel {
my ( $self , %args ) = @_ ;
my ( $S , $event , $level ) = @args { 'sys' , 'event' , 'level' };
my $role = $node -> configuration -> { roleType } || 'access' ;
my $type = $node -> configuration -> { nodeType } || 'router' ;
my $default_event_level = $self -> nmisng -> config -> { default_event_level } // 'Major' ;
# Get level from model
if ( $mdl_level = $M -> { event }{ event }{ lc $pol_event }{ lc $role }{ level } ) {
$log = $M -> { event }{ event }{ lc $pol_event }{ lc $role }{ logging };
$syslog = $M -> { event }{ event }{ lc $pol_event }{ lc $role }{ syslog };
}
elsif ( $mdl_level = $M -> { event }{ event }{ default }{ lc $role }{ level } ) {
$log = $M -> { event }{ event }{ default }{ lc $role }{ logging };
$syslog = $M -> { event }{ event }{ default }{ lc $role }{ syslog };
}
else {
$mdl_level = $default_event_level ;
}
return ( $level , $log , $syslog );
}
Standard Levels:
Fatal : System failure, immediate action required
Critical : Critical condition, escalate immediately
Major : Major problem affecting service
Minor : Minor issue, may not need immediate action
Warning : Warning condition, informational
Normal : Normal operation, UP events
Stateless Events
Some events are stateless and reset automatically:
# From Event.pm:897-919
if ( $exists && $self -> stateless ) {
my $stateless_event_dampening = $self -> nmisng -> config -> { stateless_event_dampening } || 900;
# If stateless time exceeds dampening, reset escalation
if ( time () > $self -> startdate + $stateless_event_dampening ) {
$self -> active(1);
$self -> historic(0);
$self -> startdate( time );
$self -> escalate(-1);
$self -> ack(0);
}
}
Stateless Event Types:
SNMP Traps
Custom Alerts
Some threshold events
Dampening Period:
Default: 900 seconds (15 minutes)
Configurable via stateless_event_dampening
Prevents notification flooding
Troubleshooting
Events Not Created
# Check event configuration
cat /usr/local/nmis9/conf/Events.nmis | grep "Node Down"
# Verify node is active
nmis-cli act=show node=router1 property=active
# Check polling logs
tail -f /usr/local/nmis9/logs/nmis.log | grep Event
# Force update to trigger event check
nmis-cli act=update node=router1
Notifications Not Sent
# Check contact configuration
cat /usr/local/nmis9/conf/Contacts.nmis
# Verify email settings
nmis-cli act=test-email [email protected]
# Check escalation log
tail -f /usr/local/nmis9/logs/escalate.log
# Test notification manually
nmis-cli act=test-notify contact=admin event="Test Event"
Event Not Clearing
# Check if condition actually cleared
nmis-cli act=update node=router1
# View event details
nmis-cli act=show-event node=router1 event="Node Down"
# Manually clear if needed
nmis-cli act=clear-event node=router1 event="Node Down"
Best Practices
Configure Escalations : Set appropriate escalation levels and timing
Use Acknowledgment : Acknowledge events to suppress notifications
Review Event Logs : Regularly review event.log for patterns
Tune Thresholds : Adjust to reduce false positives
Document Events : Use details field for troubleshooting context
Test Notifications : Regularly test notification delivery
Monitor Event Count : Excessive events may indicate configuration issues
Next Steps
Performance Data Configure thresholds based on collected performance metrics
Device Management Manage nodes and configure event-related properties