...
I can redirect with >> that last output, and I end up with a file that seems to show what I suspected. Forwarding this to Bill and having a discussion.
Moving onto the process of determining not-yet-collided junk netid accounts
Bill has gathered the legitimate set of netids from the authoritative source. I need to do complicated set theory operations on the set of lists of real netids versus what I have in my dumps from service now.
So Backeberg installed MySQL on his Mac, he's loading the data onto MySQL so he can use SQL against these sets and isolate records and find patterns more readily.
Downloaded MySQL 64-bit dmg. Installed the two packages. Started up the service after reading the readme.
I had to cleanup the Bill dump from prod NetID source, because it was chock full of whitespace...
Code Block |
---|
sed "s/\ //g" valid-IST1-netids.csv > valid-IST1-netids_stripped_whitespace.csv
cat valid-IST1-netids_stripped_whitespace.csv | sed "2,50d" > valid-IST1-netids_stripped_whitespace_start_with_a_through_z.csv
|
Code Block |
---|
mysql -uroot (no password initially).
CREATE DATABASE service_now_duplicates;
use service_now_duplicates;
create table legit_netids ( netid VARCHAR(15) );
create table netids_from_service_now ( netid VARCHAR(15), active BOOLEAN);
LOAD DATA INFILE '/Users/db692/service_now_duplicates/round_four/valid-IST1-netids_stripped_whitespace_start_with_a_through_z.csv' INTO TABLE service_now_duplicates.legit_netids;
|
Status
As of the end of June 1, 2012, Backeberg has isolated just under 1000 duplicate netid entries. It's fairly straightforward to wipe these out, but we're going to put that off until after I'm out of ServiceNow training next week. I've built a script to do the wipeout; Bill has asked that I make a no-op version that collects a log, and run that in pre-prod. That's running June 12, 2012.