Quantcast
Viewing all articles
Browse latest Browse all 20396

Re: NPM Alert on Node UP (Ping) SNMP DOWN (unknown / unresponsive)

While I am excited for the new "up/down via SNMP" feature, I don't think it will ultimately resolve anything. Do up/down via SNMP and inevitably some group will say that it isn't as clear as ping whether it's up or down. Do it by ping and you get the opposite. You need both and you can have both - even under pre-10.7.

 

Let everything continue to give status via PING, and add this alert:

 

Type of property: Custom SQL

Trigger Query: Node

 

(after the initial "Select Nodes.NodIE as NetObjectID... stuff)

left join (select CPULoad_Detail.NodeID, MAX(CPULoad_Detail.DateTime) as LastCPU

      from CPULoad_Detail

      group by CPULoad_Detail.NodeID) c1

      on Nodes.NodeID = c1.NodeID

where

  Nodes.Status = 1

and Nodes.UnManaged = 0

and DATEDIFF(mi,c1.LastCPU, getdate()) > 30

and DATEDIFF(mi,c1.LastCPU, getdate()) < 120

 

For those who don't read SQL, what this is saying is:

  • Grab the LAST (ie: most recent) CPULoad collection for each node
  • Trigger the alert for any node where
    • the node is UP
    • AND the node is MANAGED
    • AND The last CPU collection is older than 30 min
    • AND The last CPU collection is younger than 120 min

 

That last bullet point is there to avoid re-triggering alerts for stuff that is anciently out of date. You can build a report for that.

 

The key here is to ensure enough of a delay. Otherwise, when a device has been down for a more than 30 min, it would trigger an alert when it came back (ie: ping shows it's UP, but in the first 2-5 minutes it may very well not have SNMP data collected. We set up a 15 minute delay on ours, so that we don't cut a ticket unless a device has been out of date for 45 minutes.

 

THIS alert, along with regular ping, lets you know the true status of devices.

 

Hope it helps.


Viewing all articles
Browse latest Browse all 20396

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>