Event processing issues in Service Now

This is a happy day in ServiceNow

Background

There are four event processors in Service Now (at least as of Berlin distribution).

We can find these as Admin with 

System Scheduler -> Today's Scheduled Jobs ->

then search for name contains event.

Here's a URL to jump to the result: https://yale.service-now.com/sys_trigger_list.do?sysparm_query=next_actionONToday%40javascript%3Ags.daysAgoStart(0)%40javascript%3Ags.daysAgoEnd(0)%5EGOTOnameLIKEevent

 

If there is a stuck processor, it may show as "Queued", or as something other than Ready or Running. We need to toggle the state of the stuck processor(s), and then refresh the display and the processor should soon start up. In this screenshot, the text index events process was chewing through a backlog.

 

Nagios monitor for event processor

We also have Nagios monitoring the main event processor for Service Now.

When this fails, we get the message "MonitorEventProcessor CRITICAL - event processor missing or sick."

If this ever fails, we get a page that looks like:

The solution is to go into SN as admin, 

System Scheduler -> Today's Scheduled Jobs -> search for the keyword 'event', and you should get a listing like the listing farther up the page. Toggle the stuck process to "ready", and it should soon kick over to running. Log the incident as well.

Nagios monitor for unprocessed events

We also have Nagios monitoring the mid servers for unprocessed events.

When this fails, it reports something like "SERVICENOW EVENTS", "unprocessed events". It may also report as:

Procs check_sn_events

You may also get the note: monitor_event_processor; I don't know where that name comes from, but operations forwarded it along once.

This monitor uses a web link to directly monitor the event table, looking for jobs that haven't been worked. This is another proxy for event processor(s) being down.

When some event processor has failed, these will stack up, and you'll get a notice like (this is a dated image; the check is on host vm-snprdmid-01.its.yale.edu):

These unprocessed events can be verified in PROD by polling the event table like:

https://yale.service-now.com/sysevent_list.do?sysparm_query=sys_created_onONToday%40javascript%3Ags.daysAgoStart(0)%40javascript%3Ags.daysAgoEnd(0)%5Eprocessed%3DNULL

That link should give you a count of unprocessed events that is similar to the number listed by nagios (unless you just changed days, because the query uses 'today').

Again, the fix is the same as the other scenarios farther up the page. Track down the event processors, and toggle the state of processors that are in a 'queued' state. They should quickly flip to running, and then the unprocessed event backlog should start going down. In X minutes, depending on how big the backlog is, the processor will catch up and the nagios monitor should go green.

When the monitor doesn't go green

Using the link that shows you the unprocessed event, you may see events with status "Transferred". I don't know what that means. Does that mean a different event handler processed them, or does it mean they were marked as unprocessable?

I don't know, and I'm asking ServiceNow.