The purpose of this project is to replace the old AD Daily Updater with newer better code residing in an IIQ instance. However, the future plan to support Exchange Online routing of Email (Mail Relay Replacement) adds and additional set of requirements.
This project replaces
To accomplish this
The purpose of this code is to provision fields in the Azure AD. That is where Outlook goes to look for O365 accounts and directories. That is where Exchange Online goes to deliver mail to mailboxes or forward mail to Eliapps. However, because we run a Local AD and an Azure AD synchronized to each other, the current AD Daily Updater and this replacement code populate data only in the Local AD and the expect it to be copied thorough the Azure AD Connector synchronization tool within the next half hour. Except for an additional feature added to this new code (to directly change the UserPrincipalName in Azure AD if it is incorrect), the specification of this project is to change the Local AD and except for enumerated differences, to put the same data in Local AD that the existing Java code put there. There is no need for an Azure AD to develop the code or test correct function.
When this project goes live, it will begin to provide features that the old AD Updater did not have, in preparation for changes that are anticipated but do not currently have a target date.
The Email Alias and Mail Relay system implements a "goes to" methodology looking up email addresses and selecting a target mail system for that address. Every Alias has a row in the table, and aliases to the same Inbox are represented by multiple rows with the same destination MAILBOX value. The Exchange Online system depends on AD maintaining a "comes from" list, where the AD User object associated with the Inbox has a list of email addresses that get delivered to the mailbox.
For example the current system is based on a table that says:
"john.doe@yale.edu" goes to the mail account for AD User jd234
"jack.doe@yale.edu" goes to the mail account for AD User jd234
"jane.doe@yale.edu" goes to the mail account for AD User jd235
While the Exchange model is:
The AD User object jd234 has a list with "john.doe@yale.edu" and "jack.doe@yale.edu" as names for the mail account.
The AD User object jd235 has a list with just "jane.doe@yale.edu".
This is an over simplification because Exchange also forces us to add several built in required email names on top of the email address published for each user, but the important concept is that the Mail Relay and Exchange models express the same information from two opposite directions. This would not be a problem, except that over time the rules have not been followed and the data in the tables is not clean.
There is also a compatibility issue because the Mail Relays treat O365 and Eliapps equally, while Exchange will deliver mail to the O365 Inbox if at all possible. When information about the Email Aliases is ambiguous, the is mail that the Mail Relays will reliably send to Eliapps but that Exchange will instead send to O365 for a user who has both types of accounts. This is not, however, an issue for the AD Updater replacement project but will have to be addressed later on with the follow on "Mail Relay replacement" project.
There are a set of Yale and Exchange rules. We can't change the Microsoft rules, and we are not prepared to change Yale rules established long ago. Enumerating the rules explains exactly how and why certain mail routing will change.
Yale policy is to give users either O365 or Eliapps mail but not both, but exceptions are allowed. Yale policy is that when a user has both O365 and Eliapps mail, the O365 account is Primary. So if we followed our own rules there would be no problem. Unfortunately, either because accounts were "grandfathered" in from previous systems or because someone important asked for an exception, there are about 5 cases where the Primary Email Alias points to Eliapps and there is a secondary alias pointing to an existing O365 account. We will have to remediate these users.
This is not an AD Updater problem. It is a Mail System restriction that we cannot program around and will be ignored by the coding. We will do what we can do. There are ways to change the Email Aliases with Dependent Netids that can work around the problem, but that again is an Email System trick and not part of this project. However, understanding how mail delivery is done and how it changes is important to understanding the design, and this particular glitch is a useful example to motivate a description of the environment to which this code must be designed.
The big design feature that derives from this explanation, is that when the Alias table is read in by IIQ (is "aggregated" in IIQ terms) it is presented as one of four different configurations of object data by a database view that coverts the flat raw table into a usable set of database "rows" (which become program objects):
There is a fifth category of unprocessed aliases. Any Alias with a MAILBOX of "xxxx@connect.yale.edu" where the prefix xxxx is not a Netid must be a resource (a room, distribution list, etc.) and will not be processed. Exchange Online knows these things and handles them natively, so we don't have to create objects to inform Exchange about its own native stuff.
Any user can have Junk, so while each mail user Netid is either Case 1, 2, or 3, they can also have any number of Type 4 entries. Type 4 is grouped by MAILBOX and correlated to a Contact identified by the MAILBOX. It is not connected in any way to the Netid that owns the Alias. The junk aliases are spread across mailman, elilists, panlists, cs, som, aya, invest, med, physics, chem, geology, cmp, math, astro (all .yale.edu). Some of this is obsolete, but cleaning it up is future scope.
Users with an O365 or Eliapps account can delete it, and users with one type of account can get the other. So you can transition between Type 1 and 3, or 2 and 3. This is a delicate process because IIQ cannot directly move a field like ProxyAddresses from a User object to a Contact object in a single transaction (as would be possible with database SQL). This is, however, a manual process and becomes a function of the Mail System people. This code does not provide transition services but will provide documentation about how to make the transition correctly
Once you understand the four possible "packages" of mail routing data, the mechanism for gathering the data and passing it to IIQ is relatively simple. A SQL query of the Alias table (DIR_ONLINE_INFO) groups rows by MAILBOX. The AliasNames of the same MAILBOX are concatenated together in a string field. If the MAILBOX is "@connect.yale.edu" it is O365, if it is "@bulldogs.yale.edu" it is Eliapps, and if it is something else it is junk. The trick is then to create SQL clauses that flag primary accounts from secondary accounts, and distinguish Eliapps account that exist alone from Eliapps accounts belonging to someone who also has an O365 mailbox which have to be then assigned to a Contact. If we don't do this processing in SQL where it is easy, it becomes much more difficult in IIQ to know if an incoming Eliapps account record belongs to a Netid who also has an O365 record and therefore correlate it to or create a Contact Identity. If the SQL view has already done that analysis, IIQ can implement the correlation that the SQL has pre-processed.
The Alias table has bad data. We will remediate some problems and ignore issues that cause no problem for data delivery. The following list includes all the the things that need to be remediated, and some other interesting problems:
We synchronize the on premise Local AD to the Azure AD. IIQ must provision changes to the Local AD and wait for the synchronization to occur within the next half hour. The only field in Azure AD that can be provisioned directly is UPN.
There are existing Contacts created in AD to make specific Eliapps users visible to Outlook. These Contacts tend to be in the Users container, they are named by the Primary Email Alias, they have "biographical" data, and they do not have anything in the three mail routing fields. The new Contacts we create will be in a special OU, will be named by the MAILBOX, will not have "biographical" fields, and they will have mail routing properties. A given Eliapps user may have one of each type of Contact, but this code only generates the Mail Routing Contact and ignores the other type.
The update of biographical fields (firstname, lastname) will occur in the Identity Management instance of IIQ and will be driven by changes to Identity variables. The update of mail routing fields will be done in the Email Provisioning instance of IIQ.
For sanity, we try to ensure that both instances of IIQ share a common set of defined Identity Variables, but we do not guarantee that these Variables are actually populated with values in an instance that does not use that particular variable in any code. The two instances will source the variable value from different places, and they MUST in general partition the Target (Application and field) where values are provisioned. We do not want to get into even temporary chase conditions where one Instance sets a field to one value and the other instance sets it back to the previous value due to different schedules in the aggregation.
There is a UserPrincipalName in both Local AD and Azure AD. It is identical to the Primary Email Alias, and the rules constrain it to interact with Mail Routing (it must be in the ProxyAddresses list of an O365 User), but used as a UPN is it a string used to log you in and is therefore like the "biographical" fields. However, it is more closely tied to the other fields used by Mail Routing (it is sourced, for example, from the Alias table, and so we regard it as a Mail Routing field and provision it along with the real Mail Routing fields.
Additional Applications (data sources) required:
Additional Applications (data sources) required:
As is currently done in Email Provisioning, Identity Management will have a Role with an Entitlement to membership in the Provisioned Users group of LocalAD. Assigning an Identity to the Role will create a new AD User object for that Netid if one does not already exist. The User object is initially created with minimal information, and is then filled in with the full set of biographical data in the next cycle.
This application adds a set of Identity Variables named ADXXXX (example: ADgivenName) are filled in by a Rule run after an Identity Refresh task. Normally Identity Variables are filled in after new data is read in from a source system, but the ADXXXX variables are assigned values from other Identity Variables that were themselves populated from source systems. For example, your Job Title comes in from Workday Worker and is used to populate a previously configured Identity Variable which then used the raw title for various Yale systems. However, the ADtitle variable is constrained to be a maximum of 128 characters because that is the maximum field length in AD. So at Yale there are currently 51 people with longer Job Titles in Workday Worker and other Yale systems, but we truncate that to 128 when we create the ADtitle variable that provisions the Title property in AD.
There are also restrictions due to Privacy rules for certain people, and in a few cases there are multiple possible source fields for the same AD variable, and the RefreshRule will select the right source based on Affiliation and other attributes.
The Netid System is the source for Identities created for Dependent Netids. It does not have any biographical data, so the source fields are all null, and the ADXXXX variables are all null, and the AD gets no biographical information.
Biographical ADXXXX variables have a LocalAD Target field only in the Identity Management instance. These Identity Variables may be filled in by the RefreshRule running in the Email Provisioning instance of IIQ, but they will not provision AD User objects from that instance.
While the Email Alias table is usually viewed by the Netid of the alias owner, provisioning has to go into an AD object associated with the mail account (the MAILBOX value). So this code develops views that group aliases and produce one row for each MAILBOX value. In this approach, the O365 account for a Netid is the row with MAILBOX value "Netid@connect.yale.edu". Rows with a non-Netid before "@connect.yale.edu" a distribution lists, rooms, and other resources that are currently out of scope.
The Eliapps accounts do not have a Netid, so here there is a different rule. If a Netid owns a Primary Email alias pointing to an Eliapps account ("@bulldogs.yale.edu") and there are no other aliases pointing to a "Netid@connect.yale.edu" account, then this is a pure Eliapps user and the mail routing fields can go in the AD User object associated with the Primary Alias owner. Here we have a bit more freedom, because while routing for an O365 account MUST go into the AD User object to which the account is tied, an Eliapps account can be described by either an AD User or Contact object, and our decision here to use the otherwise unused User object is simply an optimization.
Therefore, any Eliapps account that fails the tests to put it in the User object will end up in a Contact, along with Departmental Eliapps accounts and Legacy "junk" alias to neither O365 or Eliapps.
The SQL converts the entries for all Aliases that point to the same mailbox to a string of AliasNames. By sorting the Primary Alias first (if it exists) and setting the first name to have an "SMTP:" prefix and the subsequent aliases to have an "smtp:" prefix, we satisfy the rule that every ProxyAddresses string must have one primary (capitalized) entry and it should be the Primary if one exists. By doing this work in the database, where sorting rows is easy, we avoid the problem of dealing with multiple account records from the same application, which IIQ can handle but which greatly complicates the coding.
There are three database views. One returns the O365 Netid mail routing data. A second returns the Primary Eliapps mail routing for non-O365 Users (which also goes in the User object). The last view returns routing data for Contact objects. The first two views provide the SAMAccountName of the User object to which the data should be attached, while the last returns only the MAILBOX and not a Netid, because Contacts are generated and managed relative to the MAILBOX name.
The MailNickName is used by Exchange Online to route mail, but we do not set it based on its routing properties. Rather, the MailNickName is by Yale Policy set to the Primary ALIAS_NAME, which is also the prefix (before "@yale.edu") of the UserPrincipalName. That policy effectively makes MailNickName a "biographical" field because it is set by biographical rules. However, because of Exchange algorithms, we only want to set MailNickName in the User objects of Email users. So while the value is calculated biographically, the biographical value is only copied to the ADMailNickName field if we have O365 or Eliapps-in-User-Object mail routing view data for that SAMAccountName.
Contact objects are created and deleted for the sole purpose of holding Mail Routing data. So the Application that reads from the Contact view is "authoritative", creating new Contact Identities when a new Contact MAILBOX value appears, and deleting Contacts that are no longer supported by any linked Account. However, User objects are already created by other processes, and since the User object must exist for an O365 account to exist, it makes no sense to create a User object for the purpose of putting in Mail Routing for an O365 account that cannot exist because there is no User object. So the application that feeds Mail Routing information (for both O365 and Eliapps) is not authoritative and waits for the User object to be created by the Identity Management IIQ instance before the data is linked to the Netid Identity object and the fields are provisioned. The difference between authoritative and non-authoritative behavior is the main reason for splitting the Email Routing data into two IIQ Applications.
So in summary:
Netid Identities are created by the existing Identity Management IIQ process from SOI or in Email Provisioning from IDR Identities.
In Identity Managment, the LCM "Joiner" workflow assigns a Netid and UPI to the identity, and a Role Assignment Rule grants the Identity an Entitlement that will cause the AD User object to be created. When the LocalAD is next aggregated, the AD User object is read in and linked to the Identity.
When the Email Provisioning IIQ instance creates the Identity when it gets a new Netid from IDRIdentities. At that point it can correlate the LocalAD User object and the MailRouting-In-UserObject account with a matching SAMAccountName.
It doesn't matter if the MailRouting data arrives before or after the LocalAD User object is aggregated. IIQ will wait until there is a LocalAD object before provisioing the ADXXX Identity Variables to their target properties.
Meanwhile, Contact Identities will be created based on the aggregation of the Contact Mail Routing Application. A Contact Identity automatically gets an Entitlement that creates the Contact Object in AD, and then the Mail Routing variables are synched in.
The view presents ProxyAddresses in an order, Primary Alias first, then Secondary Aliases in alphabetical order. We will test for and write code to prevent any unnecessary reprovisioning of the ProxyAddresses if the calculated and aggregated values contain the same values, but the "smtp:" values are in a different order (unless IIQ variable synchronziation already does this).
Currently all mail with an address ending in "@yale.edu" is processed by the Mail Relay machines at Yale. They are configured (once an hour) from the data in the Mail Alias system, where is unique MailAliasName+"@yale.edu" is resolved to a native Mail system address (the Mailbox). If the Mailbox is "netid@connect.yale.edu" then this is an O365 address to be processed by Exchange Online. If it is "@bulldogs.yale.edu" then it is an Eliapps account to be processed by Google. In all other cases, the Mailbox address substitutes for the original "@yale.edu" and is processed through SMTP.
The Mail Relay function is to be retired and all mail will now pass through Exchange online. Essentially, we don't have to do anything differently for O365 accounts because that mail already passes through Exchange and is delivered correctly. So the change is for Eliapps and other addresses.
Exchange Online is configured by data in the Azure AD. So the problem is to use the current contents of the Email Alias table (DIR_ONLINE_INFO) to configure (Azure) AD objects so the mail will be delivered the correct way. Exchange has a concept called MailNickName which is not standard and can be ignored at this point. Otherwise, Exchange Online looks at any "xxxx@yale.edu" address and tries to match it to exactly one "smtp:xxxx@yale.edu" entry in the ProxyAddresses list of an AD User or Contact object. If it matches and this is not an O365 account, then it delivers mail based on the TargetAddress field of the matched object.
So the requirement is simple:
Every AliasName in the Alias table (prefixed with "smtp:" and suffixed by "@yale.edu") must be in exactly one AD User of Contact ProxyAddress list and
The TargetAddress of the matched User or Contact must be set to the Mailbox value of that entry in the Alias table.
Today we populate the ProxyAddress list of the User object of everyone who has an O365 account with an entry for every Alias Table row that has a Mailbox value of "netid@yale.edu".
So the new step is to do essentially the same thing for Eliapps mail users. If they do not have an O365 account, then we can set the TargetAddress of their AD User object to "EliappsAccount@bulldogs.yale.edu" and then populate their ProxyAddresses with every Email Alias with a Mailbox that points to that account (exactly what we do now for O365 users). If someone has both an O365 and an Eliapps account, then the O365 account has to go in the User object, so we have to create a second Contact object for the Eliapps ProxyAddresses list and TargetAddress.
Then for any other Alias (not O365 or Eliapps) the one Contact object per row in the Alias table works fine.
We accomplish this by aggregating Alias entries by Netid, and then for each Netid
Unfortunately, the existing Alias table has some Netids that have a mixture of personal and departmental Eliapps accounts. We can looks for Google Accounts with a name like PrimaryAliasName, or PrimaryAliasName.eliapps, or Netid. After that, it is difficult to write a program that can reliably distinguish an alias name of "Alfred.E.Neuman" as personal but "Mad.Magazine" as departmental (and therefore an Alias entry that should have been moved to a Dependent Netid). For now we will follow the Shibboleth login rule that regards a PrimaryAlias as personal, but if the Primary is O365 then the first Secondary Eliapps alias is Personal (where it becomes the "special" Eliapps Contact where multiple Aliases are aggreated) and any subsequent Eliapps alias with a different MAILBOX is treated as Departmental and generates Contacts where we do not attempt to aggregate more than one Alias into the ProxyAddresses list. The Mail Routing will be done correctly no matter what. If we confuse personal and departmental then the number of Contacts can increase and the Outlook directory may not be optimal. Users can fix this by moving their damn departmental Eliapps aliases to Dependent Netids like they were supposed to.
Deprovisioning Departmental Contacts is not optional. Any Contact not supported by a current Email Alias entry has to be deleted. In fact, because the Azure AD will complain if any entries violate the rule that the same alias name cannot appear in the ProxyAddresses list of two objects at the same time, deprovisioning should occur before provisioning to handle the cases where Aliases are transferred from one Netid to another. Unfortunately, this is exactly the sort of thing IIQ does not handle well. So while we could continue the current model of building tools that update the Alias Table and then expect routing information to be generated in the background as is done for the current Mail Relays, future tooling should be designed to change the AD at the same time that the Alias table is changed.
It is proposed that Users are created as they are currently (CN=Netid, CN=Users, DC=yu, DC=yale, DC=edu), that Eliapps personal Contacts be created as (CN=Netid, OU=EliappsUsers,...) and that the remaining individual Aliases generate Contacts in (CN=AliasName, OU=Aliases, ...).
There are 300K Netids, but only ~50K are active.
There is a big difference in performance between frequently aggregating all the users, or just the active users. An inactive user with no mail account does not change information frequently and there is no SLA on updating the last name in AD if an ALUM or retiree changes it. Therefore, it would be vastly better if we can agree to aggregate only the active users most of the time, and to update the inactive users (SOI=IDR and no Email Address) less frequently. Currently the TEST AD has such a scheme, but the PROD AD does not.
If inactive users can be deprovisioned from AD (which means they will need a PIN to reactivate), this would solve the problem.
It is a requirement that the code not create AD objects for inactive accounts (SOI=IDR) with no mail.
The current AD Updater is driven by values generated by the YUONLINEDIR.PUBLISHED_AD_ACCOUNTS_V view in ACS1. It has logic of the form:
DECODE (r.flag, 'G', ph.grad_student_phone, 'U', ph.campus_phone, ph.office_phone ) telephonenumber |
Which basically says that the variable 'telephonenumber' will be taken from the grad_student_phone if the identity is a Grad Student, the campus_phone if the identity is an Undergrad, or else the office_phone for everyone else (Employees).
There are two ways this can be translated. We can create a comparable Database View only in IDR instead of ACS1 (which means translating Oracle SQL into SQL Server SQL) or we can create a Global Rule in IIQ that populates an Identity Variable named 'telephonenumber' from one of three IDR source fields (which means translating Oracle SQL into Java).
The primary difference is staffing (Java or SQL programmer) and maintenance (Change to the application or to the database).
We have to decide this before coding can begin.
For the moment, the management of the Google Directory (GADS) is out of scope. We expect this might change in a Phase 2 of the development rather than being added as initial work. Once Eliapps users have to have Contacts in the AD in order to be Email routed by Exchange Online, we have all the information needed to provision the Google Directory.
Service Accounts will not be managed. They have no interesting attributes, no source of identity, and we have nothing extra of offer.
Computer objects are currently handled by the Microsoft Domain.
Groups may be managed by Grouper.
We are interested in the Users container and the SOM OU. Anything outside those two areas will be ignored if it is well behaved. Generally speaking, if an object has a SAMAccountName outside the range of valid Netids and if it does not own ITS managed Email accounts, then we can leave it alone.
User and Contact objects not related to a Netid are out of scope. We may choose to declare that they are illegal in the Users container, but migrating them elsewhere is not a separate cleanup.
The Email Alias system will remain in its current form in ACS1. In some sense the information in AD duplicates the content of DIR_ONLINE_INFO, but AD is not able to provide the required function for generating unique Email Aliases.
There will be an Identity object for every Netid (personal and dependent).
An Account object is created when aggregation correlates data from any of the six sources to an Identity. Correlation is by Netid, except for Azure AD where we will use the existing Email Provisioning correlation logic based on ImmutableID (because Azure doesn't store Netid).
Personal identities will have an IDR Account object.
Dependent netids will have an Netid System Account object.
At this time we believe that Contacts can be maintained as an Account under the Netid that owns the Alias that created the Contact. If that proves unworkable, then Contacts become a separate Identity.
We may choose to ignore personal Netid Accounts because they provide no useful information (the Netid is in IDR).
Each Email Alias is an Account that correlates to the Netid (personal or dependent) that owns it. Email Aliases are used to generate the proxyAddresses lists and drive Contact creation for a Netid with both O365 and Eliapps accounts.
Although it is possible to provision fields from the IDR Account record directly into the AD object, the simplest approach is to create identity variables for each item that must be provisioned into the AD and Azure AD.
If we do not create Contact Identities, then when a Netid has both an O365 mail account and an Eliapps mail account, requiring the generation of a Contact to contain the routing information for the Eliapps account, then the Contact name, mail, mailNickName, proxyAddresses, and targetAddress will be separate Identity variables from the corresponding variables that generate the same fields for the User object.
Indexed
Needed for code selections
AD Fields (unindexed, from IDR)
Calculated
Any non-mail entry (X500 or sip) in the ProxyAddresses list will be ignored by processing and will "pass through" from aggregation to provisioning.
The "SMTP/smtp" entries of the proxyAdddresses list will be generated by us and will replace all entries in the current AD. This will fix reported problems where the same value is duplicated in more than one proxyAddresses list belonging to different objects (which cannot be allowed if this information is used to do Mail Routing).
Because it is Yale Policy that an O365 account is Primary, and because the Primary Email Alias sets the UPN which is a key field of Azure AD, we do not allow an Eliapps account to be Primary if an O365 mailbox is assigned to the AD User object. Generally speaking, for an O365 user we will ignore the Mailbox (should it point to bulldogs) and associate the Primary Alias with O365.
Therefore, there are four AD configurations:
There are some variables that need to be defined, and in a few cases the exact definition is technical:
The "Mail" attribute in AD is not a technical field. It is the Email Address to be "published" in the directory. At Yale it is the PrimaryAliasName@yale.edu and that happens to be the O365 account of a two-mailbox user because we require O365 to be Primary if it exists.
User Object
field | values |
---|---|
MailNickName | PrimaryAliasName |
TargetAddress | SMTP:PrimaryAliasName@yaledu.mail.onmicrosoft.com (ignored by Exchange Online) |
ProxyAddresses | SMTP:PrimaryAliasName@yale.edu |
User Object
field | values |
---|---|
MailNickName | PrimaryAliasName |
TargetAddress | SMTP:PrimaryMailbox |
ProxyAddresses | SMTP:PrimaryAliasName@yale.edu |
Contact Object
field | values |
---|---|
MailNickName | GoogleAlias |
TargetAddress | SMTP:GoogleAccount@bulldogs.yale.edu |
ProxyAddresses | SMTP:GoogleAlias@yale.edu |
Contact Object
field | values |
---|---|
MailNickName | AliasName |
TargetAddress | SMTP:Mailbox |
ProxyAddresses | SMTP:AliasName@yale.edu |