Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Table of Contents

Background:

Early on in ServiceNow implementation, somehow we used some bad sources of userdata. Bad in that there are netids in that data that are colliding with newer netids.

...

  1. Go to ServiceNow, ideally not PROD instance, and login as admin user.
  2. Navigate to application filter (top left corner), type in sys_user.list to dump the user table.
  3. This will find potentially 200k or so entries, and you cannot export that many accounts.
  4. Define a filter, which shows you a subset of users. Here's a way: "NetID starts with b"
  5. Apply that filter, which limited me down to 7715 hits
  6. Right click on the results page, along the column headers.
  7. You will get a new menu, with an Export submenu
  8. Pick CSV out of that list
  9. Choose to download the export

How to

...

rebuild a user_sys locally

I had a collection of .csv files with fewer than 50k entries, but it's nicer to work with the full set when performing queries.

Code Block
zuse:Downloads db692$ head sys_user_from_test_cat users_starts_numeric.csv users_starts_with_ba-c.csv 
"user_name","employee_number","name","title","source","u_facit","u_organization.u_department.u_division.u_id","u_organization.u_department.id","u_organization.u_id","active","mobile_phone","sys_created_on"
"b1912a","10793883","Brian Marshall","","","false","","","","false","","2012-02-23 11:43:23"
"ba2","10089420","Beatrice Abetti","","","false","D00609","G00806","872108","false","","2012-02-23 09:23:43"
"ba22","10097002","Barbara Amato-Kuslan","","","false","D03474","G01951","960003","true","","2012-02-23 09:24:22"

You want to look for the subset with colliding netids...

Code Block

zuse:Downloads db692$ cat sys_user_from_test_starts_with_b.csv | awk -F, '{print $1}' | uniq -users_starts_with_d-i.csv users_starts_with_j-l.csv users_starts_with_m-r.csv users_starts_with_s-z.csv > all_users.csv
# catch the header lines, and delete all but the first one
grep -n user_name all_users.csv
sed "175824d;127985d;81587d;41395d;125d" all_users.csv > all_users_one_header.csv 
cut -d, -f1 all_users_one_header.csv > all_users_just_usernames.csv

How to parse the export csv with a subset of the user table...

Code Block

zuse:Downloads db692$ head sys_user_from_test_starts_with_b.csv 
"user_name","employee_number","name","title","source","u_facit","u_organization.u_department.u_division.u_id","u_organization.u_department.id","u_organization.u_id","active","mobile_phone","sys_created_on"
"b1912a","10793883","Brian Marshall","","","false","","","","false","","2012-02-23 11:43:23"
"ba2","10089420","Beatrice Abetti","","","false","D00609","G00806","872108","false","","2012-02-23 09:23:43"
"ba22","10097002","Barbara Amato-Kuslan","","","false","D03474","G01951","960003","true","","2012-02-23 09:24:22"

You want to look for the subset with colliding netids...

Code Block

zuse:Downloads db692$ cat sys_user_from_test_starts_with_b.csv | awk -F, '{print $1}' | uniq -d | wc
      34      34     271

My concerns about this approach

...

Code Block
mysql> UPDATE uniq_between_both_sets, legit_netids SET source="AUTH" WHERE uniq_between_both_sets.netid = legit_netids.netid AND uniq_between_both_sets.netid LIKE "a%";
Query OK, 23 rows affected (1 min 57.18 sec)
Rows matched: 30  Changed: 23  Warnings: 0

Trying it again with a broader query.

Code Block

mysql> UPDATE uniq_between_both_sets, legit_netids SET source="AUTH" WHERE uniq_between_both_sets.netid = legit_netids.netid;

Status

...

 matched: 30  Changed: 23  Warnings: 0

Trying it again with a broader query. That was 1.2 seconds per row in uniq_between_both_sets, and there are 6233 rows in the table. I estimate the answer will come back in a little more than two hours of chunking through the tables.

Code Block

mysql> UPDATE uniq_between_both_sets, legit_netids SET source="AUTH" WHERE uniq_between_both_sets.netid = legit_netids.netid;

This completed in less time than I expected, and got interesting results...

Code Block

mysql> select DATE(netids_from_service_now.creation),count(1) FROM netids_from_service_now, uniq_between_both_sets where netids_from_service_now.netid = uniq_between_both_sets.netid GROUP BY DATE(netids_from_service_now.creation) WITH ROLLUP;
+----------------------------------------+----------+
| DATE(netids_from_service_now.creation) | count(1) |
+----------------------------------------+----------+
| 2004-05-01                             |        1 |
| 2012-02-19                             |        1 |
| 2012-02-22                             |       15 |
| 2012-02-23                             |     2266 |
| 2012-02-28                             |        1 |
| 2012-02-29                             |       12 |
| 2012-03-02                             |        1 |
| 2012-03-07                             |        2 |
| 2012-03-19                             |        1 |
| 2012-03-22                             |        3 |
| 2012-03-29                             |        4 |
| 2012-03-30                             |        1 |
| 2012-04-16                             |        1 |
| 2012-04-18                             |       22 |
| 2012-04-21                             |       16 |
| 2012-04-22                             |     3514 |
| 2012-04-24                             |        1 |
| 2012-04-25                             |        1 |
| 2012-04-26                             |       15 |
| 2012-04-27                             |        3 |
| 2012-05-01                             |        1 |
| 2012-05-02                             |        2 |
| 2012-05-03                             |        2 |
| 2012-05-04                             |        5 |
| 2012-05-07                             |        1 |
| 2012-05-09                             |        1 |
| 2012-05-14                             |        1 |
| 2012-05-16                             |        1 |
| 2012-05-17                             |        1 |
| 2012-05-18                             |        3 |
| 2012-05-22                             |       14 |
| 2012-05-23                             |        1 |
| 2012-05-24                             |        2 |
| 2012-05-25                             |        4 |
| 2012-05-28                             |        4 |
| 2012-05-31                             |       29 |
| NULL                                   |     5953 |
+----------------------------------------+----------+
37 rows in set (1 min 24.52 sec)

The two big days; in February and April, I already knew about from intermediate spot checking. What I'm surprised about is how broad of a collection of hits I have on accounts that are in Service Now, but are not valid in Central Auth. These are small enough collections that I should be able to pick a few of them by hand and see if I can figure out what is going on.

Status

As of the end of June 1, 2012, Backeberg has isolated just under 1000 duplicate netid entries. It's fairly straightforward to wipe these out. I've built a script to do the wipeout; Bill has asked that I make a no-op version that collects a log, and run that in pre-prod. That's running June 12, 2012. June 15, 2012 I checked on the job; still running. Bill helped me find a bug in my transform (I didn't coalesce on common rows; he said he makes the same mistake occasionally). I tried stopping the job, but it wouldn't stop. Based on the burn rate, it should finish on its own Monday or Tuesday, by which time I'll probably be out on leave.

I've also approached the problem of finding the bad account entries in Service Now that have not YET collided with a legitimate account, but will in the future as new NetIDs are created. Said differently, these are the difference between the overall set of users in Service Now and the set of users in Central Auth. I've been using a collection of MySQL and command line tricks to tease out these entries, and I think my fifth round results are both accurate and my data analysis approach will get a result in a reasonable amount of time. Then we will probably put those results through the same scrutiny as my original set of duplicates data set.

The correct Service Now script to wipe bad user entries

Code Block

/**
 * For variables go to: http://wiki.service-now.com/index.php?title=Import_Sets
 **/ 

var netid = source.netid;

var rec = new GlideRecord('sys_user');
/*rec.addQuery('user_name','=',source.u_netid);*/
rec.addQuery('user_name',source.u_netid);
rec.addQuery('sys_created_on','<','2012-02-24 12:00:00');
rec.addInactiveQuery();
rec.query();
if (rec.next()){
  rec.deleteRecord();
}