Configuration Strategies

Shibboleth is configured with three primary files (attribute-resolver.xml, attribute-filter.xml, and relying-party.xml) and a number of "metadata" files.

The three primary files can be stored on local disk, or a copy on local disk can be updated from Web URL source, or a copy on local disk can be updated when a new version is checked into a configured Subversion server.

Although the Subversion server option is typically recommended, because it provides control and history and an easy fallback option, Shibboleth and CAS are "Tier 0" applications that have to be up first in a disaster recovery situation and Shibboleth does not initialize if a primary configuration file is associated with a Subversion server that is not yet restored to operation. Our current system ultimately stores the authoritative version of the three primary configuration files in Subversion, but instead of Shibboleth checking the files out itself, new copies of the files are copied to the Shibboleth local disk when a system administrator runs the Jenkins Shibboleth Install job with the "config" option.

We have configured Shibboleth to check every 30 seconds for a new copy of any configuration file. When it sees a new file it reads the contents into memory. If there is an obvious syntax problem and the file is unusable, it is ignored. Otherwise, after it has been processed the configuration from the new file replaces the old configuration. The "config" version of the Jenkins Install job checks out the current version of all the configuration files from Subversion. Then an Ant script copies these new copies of the files into the active Shibboleth directories. Within 30 seconds the running Shibboleth sees that all the files have been changed and it reads them into memory.

The only problem is that if you replace all the configuration files at once, a mistake can cause Shibboleth to stop working correctly. The proposed strategy is to design smaller subsets of the "config" Jenkins install that only replace specific configuration files, and to rank the level of risk for each type of change so that the level of approval and processing is appropriate for the change being made. We also need to expand the testing options so that every change can actually be tested in its real environment before it is put into production.

There are a number of ways this can be accomplished. However, to understand the level of risk (and in some cases, the complete absence of any risk) you need to know a bit more about the configuration files.

Attribute-Resolver (Queries and Attributes)

The attribute-resolver.xml file begins with a series of database or LDAP queries. Currently Shibboleth pulls data from Oracle instances (ACS, IST, IDM, HOP), the IDR SQL Server database, the Public LDAP Directory, and the Windows AD LDAP directory.

If any query fails, then Shibboleth stops working unless each query is configured with a "Failover", which can either be another query to another database or it can be a static default value (which can be null) for the returned information. We have learned from the datacenter power failures and other incidents the importance of having an ultimate static default value for everything to keep Shibboleth behaving properly under all conditions. Failures will occur if the instance is down, but they can also be caused by network problems or by changes to the security configuration of the userid that Shibboleth uses to access a database. In one incident, Shibboleth worked in DEV and TEST but then failed in production because the userid it logged into the database with had been granted access to a table in DEV and TEST but did not have that permission in the PROD database instance. This explains why it is important to add a testing stage that EXACTLY duplicates production with the same userids and the same production data because TEST does not guarantee exact fidelity.

In the new release, the attribute-resolver file has been reorganized to emphasize the Failover relationship, and as part of the testing of the new release we will verify that Shibboleth survives the loss of access to each data source. However, it becomes an ongoing process to ensure that every time a new query is defined, a static Failover is also created and Shib is tested for that failure.

However, defining new queries or attributes is less common, and typically it is not an emergency. With the care that should be used and the testing that should be done, the normal two week release to production cycle seems appropriate.

After the queries are defined, the same file goes on to define SAML attributes. The previous step obtained a value, but different partners want to use different names for the same thing. Take something as simple as "first name". It isn't actually that simple. In China, the name that comes first is the family name, and the individual given name comes second. It is just in the West that the individual given name comes first. Then different partners want to see this value labeled as "FirstName", "first_name", or "givenName" and when they want the long unique formal identifier it can be "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" or "http://schemas.xmlsoap.org/claims/FirstName" or the old LDAP value "urn:oid:2.5.4.42".

There are only a limited number of possible variable that you can extract from the Yale systems about a given user, but there are an unlimited number of names that people can dream up for E-Mail address or phone number. Fortunately, adding a new label for an existing value is simple and in this part of the file an error adding something new cannot cause Shibboleth to misbehave. Unfortunately, because this is the second section of a single file, and additions to the first section can cause problems if they are not done correctly, there is no quick off the shelf improvement available for the Install process. However, with a bit of Ant programming it might be possible to break the file into separate components and define different levels of testing and approval to change the two different types of configuration elements.

The attribute-filter.xml file has a long list of rules listing the Attributes (defined in the previous section) that are released to each partner. For example

    <afp:AttributeFilterPolicy id="releaseToCommunityForceStaging">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://yalestaging.communityforce.com" />
        <afp:AttributeRule attributeID="givenName"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="sn"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="mail"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
    </afp:AttributeFilterPolicy>

CommunityForce gets the givenName (firstname), sn (surname or family name), and E-Mail address (named just "mail" according to the old LDAP standards). In fact, these are all standard old LDAP attributes which are very popular in academic applications. In contrast

    <afp:AttributeFilterPolicy id="releaseToArcher">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://sso2.archer.rsa.com/adfs/services/trust" />
        <afp:AttributeRule attributeID="scopedNetidAsUPN"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="firstnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="lastnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="emailADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>

Archer gets "http://schemas.xmlsoap.org/claims/FirstName" and so on for lastname and email. These are Microsoft URL style names that are more popular these days with everyone except for the old guard in universities who still remember the LDAP names from previous failed attempts to use them.

It is almost impossible to imagine that any additions (or changes) to this file could plausibly cause a problem. However, for good practice it makes sense to arrange the order of the release elements so that the Tier 0, mission critical, or production stuff comes first, and the brand new or testing junk comes at the end. Then there could be a rule that makes the level of approval and testing depend upon where in the file you make the change. Changes to the stuff at the front are important and require signoff, while adding a new partner to the end is routine and can be done at any time. Again it would be nice to create an Ant script that breaks the sections up into separate files that are assembled at install time, and then the level of risk would be determined by which file representing which section of the configuration you are working with.

Relying-Party and Metadata

The relying-party.xml file is only important now because it defines where Shibboleth finds Metadata. It is unlikely that the file itself will be modified, but if the Ant script triggered by a new form of Jenkins Install job simply "touches" the file (an Ant operation that resets the change date) then Shibboleth notices the new date and it reloads all the Metadata files.

So now it is important to explain Metadata. SAML defines a standard format file that a partner should give us to define the two things we need to know: what is the formal name the partner uses to identify itself and where should we send the SAML message after we create it. Metadata is the most complicated possible format imaginable to carry such little information, but SAML defines a lot of extra fluff in the standard.

A partner can expose metadata with a URL, and we can configure Shibboleth to use the URL to fetch new metadata from the partner periodically, but what happens then if the partner is down when Shibboleth restarts. Fortunately, Shibboleth can be configured (although it is not the default) to not regard a failure fetching any metadata file as a fatal error that prevents initialization. However, it is safer if we make a copy of the metadata and check it into our own system, especially since it almost never changes.

Shibboleth is actually much smarter and more flexible with Metadata than it is with any of its other configuration elements. In the relying-party.xml file you define a sequence of possible metadata sources. Each source is treated as independent and dynamic. Independent means a failure of any source does not affect the validity of the other sources. Dynamic means that any source can be configured to poll a local file or a remote URL for updates and to load new data when it appears and the loading of new data for one source does not affect the other sources.

When Shibboleth needs metadata for a partner, it runs down the list of configured sources in the order in which they were configured checking each source for configuration data for the unique identity string for that partner. When it finds a match, it uses that metadata.

This creates two obvious special sources. One source we can call "the junk at the end of the list" or just the additions. The additions metadata can be used to add new configured partners, but because it comes at the end and will not be searched if the name if found in an earlier search, anything put in the additions cannot change an already configured metadata source. This file is totally safe. It cannot change any existing service. It can only add brand new configurations for new partners. Since mistakes in the file don't affect other configuration, you can change it at any time.

The other extreme is a typically empty file at the start of the list that is the "emergency-override.xml" source. Add anything to this file and it replaces any metadata in any other source. You use it to respond to an emergency when you just need to fix one piece of metadata and you don't care where it came from (InCommon, a local configuration file, whatever). It will be found first and it will fix a reported problem quickly, and then the long term fix can be handled in the normal repair cycle.

This then leaves us with a small number of special cases. Two of our partners (salesforce and cvent) use a technique that we might call the Expanding Metadata File. Every time you define a new application with these systems, instead of getting a new Metadata file you get a one line change to add to the existing Metadata file. In Salesforce, the file looks like:

      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-finance.my.salesforce.com?so=00Di0000000gP9D" index="12"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-fbo.my.salesforce.com?so=00Di0000000gP9D" index="13"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-adm.my.salesforce.com?so=00DA0000000ABT0" index="14"/>

The next time someone comes up with a new Salesforce application, it will be index="15" and will have its own unique Location value.

This means that a new type of targeted Jenkins Install job should treat the Salesforce and Cvent metadata files differently from all the other metadata we are managing. Changes to those two files is routine and requires less approval than changes to archer or hewitt.

Configuration Strategies

Attribute-Resolver (Queries and Attributes)

Attribute-Filter

Relying-Party and Metadata