Background:

Early on in ServiceNow implementation, somehow we used some bad sources of userdata. Bad in that there are netids in that data that are colliding with newer netids.

We have received at least five ServiceNow tickets with these kind of complaints since the beginning of ServiceNow go-live. The basic pattern is:

quite new netid conflicts with mystery netid already in Service Now
The legitimate netid is the one that was just created
The bad netid is the old one, and it has a create date older than Feb. 24, 2012
The legitimate netid is active=true
The bad net is active=true
Often both accounts have employer ids

An easy solution is to look for the duplicates, and go sanitize those out. Backeberg is concerned that there will be lurking netid collisions because the original imports had bad accounts.

We may need a superior algorithmic approach to go clean those out.

Here is one way to export a subset of the user table...

Go to ServiceNow, ideally not PROD instance, and login as admin user.
Navigate to application filter (top left corner), type in sys_user.list to dump the user table.
This will find potentially 200k or so entries, and you cannot export that many accounts.
Define a filter, which shows you a subset of users. Here's a way: "NetID starts with b"
Apply that filter, which limited me down to 7715 hits
Right click on the results page, along the column headers.
You will get a new menu, with an Export submenu
Pick CSV out of that list
Choose to download the export

How to parse the export csv with a subset of the user table...

zuse:Downloads db692$ head sys_user_from_test_starts_with_b.csv 
"user_name","employee_number","name","title","source","u_facit","u_organization.u_department.u_division.u_id","u_organization.u_department.id","u_organization.u_id","active","mobile_phone","sys_created_on"
"b1912a","10793883","Brian Marshall","","","false","","","","false","","2012-02-23 11:43:23"
"ba2","10089420","Beatrice Abetti","","","false","D00609","G00806","872108","false","","2012-02-23 09:23:43"
"ba22","10097002","Barbara Amato-Kuslan","","","false","D03474","G01951","960003","true","","2012-02-23 09:24:22"

You want to look for the subset with colliding netids...

zuse:Downloads db692$ cat sys_user_from_test_starts_with_b.csv | awk -F, '{print $1}' | uniq -d | wc
      34      34     271

My concerns about this approach

If you go through the set of colliding netids, they seem to be sequential. This makes me concerned that there are lingering bad accounts in our sys_user table that have not yet collided, but will collide when new netids are created as new people develop relationships with Yale.

Here's some work I did...

cat sys_user_from_test_starts_with_b.csv | awk -F, '{print $1}' | uniq -d > duplicate_netids_starts_with_b.csv
cat duplicate_netids_starts_with_b.csv | while read line ; do grep $line sys_user_from_test_starts_with_b.csv ; done

I can redirect with >> that last output, and I end up with a file that seems to show what I suspected. Forwarding this to Bill and having a discussion.

Moving onto the process of determining not-yet-collided junk netid accounts

Bill has gathered the legitimate set of netids from the authoritative source. I need to do complicated set theory operations on the set of lists of real netids versus what I have in my dumps from service now.

So Backeberg installed MySQL on his Mac, he's loading the data onto MySQL so he can use SQL against these sets and isolate records and find patterns more readily.

Downloaded MySQL 64-bit dmg. Installed the two packages. Started up the service after reading the readme.

I had to cleanup the Bill dump from prod NetID source, because it was chock full of whitespace...

sed "s/\ //g" valid-IST1-netids.csv > valid-IST1-netids_stripped_whitespace.csv
cat valid-IST1-netids_stripped_whitespace.csv | sed "2,50d"  > valid-IST1-netids_stripped_whitespace_start_with_a_through_z.csv

I had to cleanup my dump from Service Now to get just the fields I wanted...

cat a_through_z.csv | awk -F'","' '{print $1","$10",\""$12}' > a_through_z_netids_active_createdate_with_garbage.csv
cat a_through_z_netids_active_createdate_with_garbage.csv | sed 's/^"//g' > a_through_z_netids_active_createdate.csv

mysql -uroot (no password initially).
CREATE DATABASE service_now_duplicates;
use service_now_duplicates;
create table legit_netids ( netid VARCHAR(15) );
create table netids_from_service_now(netid VARCHAR(15),active ENUM('true','false'),creation DATETIME);
LOAD DATA INFILE '/Users/db692/service_now_duplicates/round_four/valid-IST1-netids_stripped_whitespace_start_with_a_through_z.csv' INTO TABLE service_now_duplicates.legit_netids;
LOAD DATA INFILE '/Users/db692/service_now_duplicates/round_four/a_through_z_netids_active_createdate.csv' INTO TABLE service_now_duplicates.netids_from_service_now FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"';
delete from netids_from_service_now WHERE netid="user_name";
delete from legit_netids WHERE netid="USERNAME";

Now we're loaded and ready to start doing some set logic.

This shows me the duplicate netids...
select netid from netids_from_service_now group by netid HAVING count(1)>1;

Status

As of the end of June 1, 2012, Backeberg has isolated just under 1000 duplicate netid entries. It's fairly straightforward to wipe these out, but we're going to put that off until after I'm out of ServiceNow training next week. I've built a script to do the wipeout; Bill has asked that I make a no-op version that collects a log, and run that in pre-prod. That's running June 12, 2012.

Operation removing the conflicting or broken user accounts

Here is one way to export a subset of the user table...

How to parse the export csv with a subset of the user table...

My concerns about this approach

Moving onto the process of determining not-yet-collided junk netid accounts

Status