...
Code Block |
---|
mysql> UPDATE uniq_between_both_sets, legit_netids SET source="AUTH" WHERE uniq_between_both_sets.netid = legit_netids.netid; |
This completed in less time than I expected, and got interesting results...
Code Block |
---|
mysql> select DATE(netids_from_service_now.creation),count(1) FROM netids_from_service_now, uniq_between_both_sets where netids_from_service_now.netid = uniq_between_both_sets.netid GROUP BY DATE(netids_from_service_now.creation) WITH ROLLUP;
+----------------------------------------+----------+
| DATE(netids_from_service_now.creation) | count(1) |
+----------------------------------------+----------+
| 2004-05-01 | 1 |
| 2012-02-19 | 1 |
| 2012-02-22 | 15 |
| 2012-02-23 | 2266 |
| 2012-02-28 | 1 |
| 2012-02-29 | 12 |
| 2012-03-02 | 1 |
| 2012-03-07 | 2 |
| 2012-03-19 | 1 |
| 2012-03-22 | 3 |
| 2012-03-29 | 4 |
| 2012-03-30 | 1 |
| 2012-04-16 | 1 |
| 2012-04-18 | 22 |
| 2012-04-21 | 16 |
| 2012-04-22 | 3514 |
| 2012-04-24 | 1 |
| 2012-04-25 | 1 |
| 2012-04-26 | 15 |
| 2012-04-27 | 3 |
| 2012-05-01 | 1 |
| 2012-05-02 | 2 |
| 2012-05-03 | 2 |
| 2012-05-04 | 5 |
| 2012-05-07 | 1 |
| 2012-05-09 | 1 |
| 2012-05-14 | 1 |
| 2012-05-16 | 1 |
| 2012-05-17 | 1 |
| 2012-05-18 | 3 |
| 2012-05-22 | 14 |
| 2012-05-23 | 1 |
| 2012-05-24 | 2 |
| 2012-05-25 | 4 |
| 2012-05-28 | 4 |
| 2012-05-31 | 29 |
| NULL | 5953 |
+----------------------------------------+----------+
37 rows in set (1 min 24.52 sec)
|
The two big days; in February and April, I already knew about from intermediate spot checking. What I'm surprised about is how broad of a collection of hits I have on accounts that are in Service Now, but are not valid in Central Auth. These are small enough collections that I should be able to pick a few of them by hand and see if I can figure out what is going on.
Status
As of the end of June 1, 2012, Backeberg has isolated just under 1000 duplicate netid entries. It's fairly straightforward to wipe these out. I've built a script to do the wipeout; Bill has asked that I make a no-op version that collects a log, and run that in pre-prod. That's running June 12, 2012. June 15, 2012 I checked on the job; still running. Bill helped me find a bug in my transform (I didn't coalesce on common rows; he said he makes the same mistake occasionally). I tried stopping the job, but it wouldn't stop. Based on the burn rate, it should finish on its own Monday or Tuesday, by which time I'll probably be out on leave.
...