Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Overview
Architecture
Installation
Configuration

Service Now data from Service Desk Connector, stock

Code Block
titlehttp://yalesandbox.service-now.com/nav_to.do?uri=ecc_queue.do?sys_id=f89666ae71a42000a8d1496cf964454c

<notification><priority>1</priority><short_description>HTTPS - 443 - ADDR1 - Dumb is CRITICAL on host trapeze3.its.yale.edu</short_description><comments>Service: HTTPS - 443 - ADDR1 - Dumb
Host: trapeze3.its.yale.edu
Address: trapeze3.its.yale.edu
State: CRITICAL
Date/Time: Tue Mar 6 14:26:18 EST 2012

Additional Info:

CRITICAL - Socket timeout after 10 seconds</comments><category>Network</category><checktime>1331061975</checktime><correlation_id>trapeze3.its.yale.edu;HTTPS - 443 - ADDR1 - Dumb;1331061544</correlation_id><state>CRITICAL</state><servicename>HTTPS - 443 - ADDR1 - Dumb</servicename><hostname>trapeze3.its.yale.edu</hostname><contact_type>Opsview</contact_type></notification>
Code Block
titlehttp://yalesandbox.service-now.com/nav_to.do?uri=ecc_queue.do?sys_id=549626ae71a42000a8d1496cf96445d0

<notification><priority>1</priority><short_description>physical2.virtual.yale.edu is DOWN</short_description><comments>Host: physical2.virtual.yale.edu
Address: physical2.virtual.yale.edu
State: DOWN
Date/Time: Tue Mar 6 14:25:44 EST 2012

Additional Info:

CRITICAL - physical2.virtual.yale.edu: rta nan, lost 100%</comments><category>Network</category><checktime>1331061942</checktime><correlation_id>physical2.virtual.yale.edu;;1330856341</correlation_id><state>DOWN</state><hostname>physical2.virtual.yale.edu</hostname><contact_type>Opsview</contact_type></notification>
Code Block
titlehttp://yalesandbox.service-now.com/nav_to.do?uri=ecc_queue.do?sys_id=e8966eead4a0200054d3e703207fe9e4

<notification><priority>1</priority><short_description>Load Average is CRITICAL on host meg.its.yale.edu</short_description><comments>Service: Load Average
Host: meg.its.yale.edu
Address: meg.its.yale.edu
State: CRITICAL
Date/Time: Tue Mar 6 14:23:59 EST 2012

Additional Info:

CRITICAL - load average: 13.19, 12.05, 10.57</comments><category>Network</category><checktime>1331061769</checktime><correlation_id>meg.its.yale.edu;Load Average;1331061048</correlation_id><state>CRITICAL</state><servicename>Load Average</servicename><hostname>meg.its.yale.edu</hostname><contact_type>Opsview</contact_type></notification>

Concerns

There is a lot of state change within Opsview. Sometimes this state change is considered spiurious upon human inspection while its always deemed a genuine issue per Opsview. Yale should tread carefully regarding opening this Opsview dataflow up to Service Now so as to avoid a firehose condition.

Moreover this integration should be used as a lightning rod to initiate and push for amending the monitoring stack where appropriate to dial back the amount of state change.
-nick, 20120305

...