Configuration Strategies

Shibboleth is configured with three primary files (attribute-resolver.xml, attribute-filter.xml, and relying-party.xml) and a number of "metadata" files.

The three primary files can be stored on local disk, or a copy on local disk can be updated from Web URL source, or a copy on local disk can be updated when a new version is checked into a configured Subversion server.

Although the Subversion server option is typically recommended, because it provides control and history and an easy fallback option, Shibboleth and CAS are "Tier 0" applications that have to be up first in a disaster recovery situation and Shibboleth does not initialize if a primary configuration file is associated with a Subversion server that is not yet restored to operation. Our current system ultimately stores the authoritative version of the three primary configuration files in Subversion, but instead of Shibboleth checking the files out itself, new copies of the files are copied to the Shibboleth local disk when a system administrator runs the Jenkins Shibboleth Install job with the "config" option.

We have configured Shibboleth to check every 30 seconds for a new copy of any configuration file. When it sees a new file it reads the contents into memory. If there is an obvious syntax problem and the file is unusable, it is ignored. Otherwise, after it has been processed the configuration from the new file replaces the old configuration. The "config" version of the Jenkins Install job checks out the current version of all the configuration files from Subversion. Then an Ant script copies these new copies of the files into the active Shibboleth directories. Within 30 seconds the running Shibboleth sees that all the files have been changed and it reads them into memory.

The only problem is that if you replace all the configuration files at once, a mistake can cause Shibboleth to stop working correctly. The proposed strategy is to design smaller subsets of the "config" Jenkins install that only replace specific configuration files, and to rank the level of risk for each type of change so that the level of approval and processing is appropriate for the change being made. We also need to expand the testing options so that every change can actually be tested in its real environment before it is put into production.

There are a number of ways this can be accomplished. However, to understand the level of risk (and in some cases, the complete absence of any risk) you need to know a bit more about the configuration files.

Attribute-Resolver (Queries and Attributes)

The attribute-resolver.xml file begins with a series of database or LDAP queries. Currently Shibboleth pulls data from Oracle instances (ACS, IST, IDM, HOP), the IDR SQL Server database, the Public LDAP Directory, and the Windows AD LDAP directory.

If any query fails, then Shibboleth stops working unless each query is configured with a "Failover", which can either be another query to another database or it can be a static default value (which can be null) for the returned information. We have learned from the datacenter power failures and other incidents the importance of having an ultimate static default value for everything to keep Shibboleth behaving properly under all conditions. Failures will occur if the instance is down, but they can also be caused by network problems or by changes to the security configuration of the userid that Shibboleth uses to access a database. In one incident, Shibboleth worked in DEV and TEST but then failed in production because the userid it logged into the database with had been granted access to a table in DEV and TEST but did not have that permission in the PROD database instance. This explains why it is important to add a testing stage that EXACTLY duplicates production with the same userids and the same production data because TEST does not guarantee exact fidelity.

In the new release, the attribute-resolver file has been reorganized to emphasize the Failover relationship, and as part of the testing of the new release we will verify that Shibboleth survives the loss of access to each data source. However, it becomes an ongoing process to ensure that every time a new query is defined, a static Failover is also created and Shib is tested for that failure.

However, defining new queries or attributes is less common, and typically it is not an emergency. With the care that should be used and the testing that should be done, the normal two week release to production cycle seems appropriate.

After the queries are defined, the same file goes on to define SAML attributes. The previous step obtained a value, but different partners want to use different names for the same thing. Take something as simple as "first name". It isn't actually that simple. In China, the name that comes first is the family name, and the individual given name comes second. It is just in the West that the individual given name comes first. Then different partners want to see this value labeled as "FirstName", "first_name", or "givenName" and when they want the long unique formal identifier it can be "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" or "http://schemas.xmlsoap.org/claims/FirstName" or the old LDAP value "urn:oid:2.5.4.42".

There are only a limited number of possible variable that you can extract from the Yale systems about a given user, but there are an unlimited number of names that people can dream up for E-Mail address or phone number. Fortunately, adding a new label for an existing value is simple and in this part of the file an error adding something new cannot cause Shibboleth to misbehave. Unfortunately, because this is the second section of a single file, and additions to the first section can cause problems if they are not done correctly, there is no quick off the shelf improvement available for the Install process. However, with a bit of Ant programming it might be possible to break the file into separate components and define different levels of testing and approval to change the two different types of configuration elements.

The attribute-filter.xml file has a long list of rules listing the Attributes (defined in the previous section) that are released to each partner. For example

    <afp:AttributeFilterPolicy id="releaseToCommunityForceStaging">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://yalestaging.communityforce.com" />
        <afp:AttributeRule attributeID="givenName"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="sn"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="mail"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
    </afp:AttributeFilterPolicy>

CommunityForce gets the givenName (firstname), sn (surname or family name), and E-Mail address (named just "mail" according to the old LDAP standards). In fact, these are all standard old LDAP attributes which are very popular in academic applications. In contrast

    <afp:AttributeFilterPolicy id="releaseToArcher">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://sso2.archer.rsa.com/adfs/services/trust" />
        <afp:AttributeRule attributeID="scopedNetidAsUPN"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="firstnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="lastnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="emailADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>

Archer gets "http://schemas.xmlsoap.org/claims/FirstName" and so on for lastname and email. These are Microsoft URL style names that are more popular these days with everyone except for the old guard in universities who still remember the LDAP names from previous failed attempts to use them.

It is almost impossible to imagine that any additions (or changes) to this file could plausibly cause a problem. However, for good practice it makes sense to arrange the order of the release elements so that the Tier 0, mission critical, or production stuff comes first, and the brand new or testing junk comes at the end. Then there could be a rule that makes the level of approval and testing depend upon where in the file you make the change. Changes to the stuff at the front are important and require signoff, while adding a new partner to the end is routine and can be done at any time. Again it would be nice to create an Ant script that breaks the sections up into separate files that are assembled at install time, and then the level of risk would be determined by which file representing which section of the configuration you are working with.

Relying-Party and Metadata

The relying-party.xml file is only important now because it defines where Shibboleth finds Metadata. It is unlikely that the file itself will be modified, but if the Ant script triggered by a new form of Jenkins Install job simply "touches" the file (an Ant operation that resets the change date) then Shibboleth notices the new date and it reloads all the Metadata files.

So now it is important to explain Metadata. SAML defines a standard format file that a partner should give us to define the two things we need to know: what is the formal name the partner uses to identify itself and where should we send the SAML message after we create it. Metadata is the most complicated possible format imaginable to carry such little information, but SAML defines a lot of extra fluff in the standard.

A partner can expose metadata with a URL, and we can configure Shibboleth to use the URL to fetch new metadata from the partner periodically, but what happens then if the partner is down when Shibboleth restarts. Fortunately, Shibboleth can be configured (although it is not the default) to not regard a failure fetching any metadata file as a fatal error that prevents initialization. However, it is safer if we make a copy of the metadata and check it into our own system, especially since it almost never changes.

Shibboleth is actually much smarter and more flexible with Metadata than it is with any of its other configuration elements. In the relying-party.xml file you define a sequence of possible metadata sources. Each source is treated as independent and dynamic. Independent means a failure of any source does not affect the validity of the other sources. Dynamic means that any source can be configured to poll a local file or a remote URL for updates and to load new data when it appears and the loading of new data for one source does not affect the other sources.

When Shibboleth needs metadata for a partner, it runs down the list of configured sources in the order in which they were configured checking each source for configuration data for the unique identity string for that partner. When it finds a match, it uses that metadata.

This creates two obvious special sources. One source we can call "the junk at the end of the list" or just the additions. The additions metadata can be used to add new configured partners, but because it comes at the end and will not be searched if the name if found in an earlier search, anything put in the additions cannot change an already configured metadata source. This file is totally safe. It cannot change any existing service. It can only add brand new configurations for new partners. Since mistakes in the file don't affect other configuration, you can change it at any time.

The other extreme is a typically empty file at the start of the list that is the "emergency-override.xml" source. Add anything to this file and it replaces any metadata in any other source. You use it to respond to an emergency when you just need to fix one piece of metadata and you don't care where it came from (InCommon, a local configuration file, whatever). It will be found first and it will fix a reported problem quickly, and then the long term fix can be handled in the normal repair cycle.

This then leaves us with a small number of special cases. Two of our partners (salesforce and cvent) use a technique that we might call the Expanding Metadata File. Every time you define a new application with these systems, instead of getting a new Metadata file you get a one line change to add to the existing Metadata file. In Salesforce, the file looks like:

      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-finance.my.salesforce.com?so=00Di0000000gP9D" index="12"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-fbo.my.salesforce.com?so=00Di0000000gP9D" index="13"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-adm.my.salesforce.com?so=00DA0000000ABT0" index="14"/>

The next time someone comes up with a new Salesforce application, it will be index="15" and will have its own unique Location value.

This means that a new type of targeted Jenkins Install job should treat the Salesforce and Cvent metadata files differently from all the other metadata we are managing. Changes to those two files is routine and requires less approval than changes to archer or hewitt.

Elements of a Proposed Strategy

Currently a "config" run of the Jenkins Install job replaces all the Shibboleth configuration files with new copies checked out from Subversion.

The proposal is to add one or more new soft-config options (to be named later) that perform subsets of the "config" install. Rather than having a large number of new Jenkins options, the soft-config will be driven by the Subversion tag. That is, instead of expecting to copy everything it will expect that only a small subset of the possible files will be updated and tagged and it will only change those files.

It will be easy and completely safe to create the metadata "additions" file that is initially empty and to which new metadata can be added between full Shibboleth release cycles.
It would be useful if some special processing of the Salesforce and Cvent metadata files was provided so these standard changes could be handled routinely even though they modify existing files.
Adding new release policies at the end of the existing attribute-filter.xml file should also be safe and routine.
Adding new Attribute names (for existing unmodified queries) is the last obvious and fairly safe operation.

Then the second element of the strategy is to provide a more accurate and complete testing strategy. Currently TEST Shibboleth is connected to the TEST database instances (ACS2, IST2, IDM2, HOP4) and potentially to the TEST AD (yu.yale.net). This provides a service for those who need to use test netids, but it does not actually test what is going to go into production.

It is also true that most partners do not support TEST environments. In fact, the entire InCommon Federation has no concept of TEST and no provision for us to define our TEST Shibboleth.

However, while CAS is bound to a particular well known URL (secure.its.yale.edu/cas), Shibboleth is actually not bound to a URL or server but rather is known by the Public/Private key pair stored in its /usr/local/shibboleth-idp/credentials folder. Create a second instance of Shibboleth running on any server anywhere in the organization and give it a copy of the same credentials files and it will generate a SAML message that will be accepted as legitimate by any of our partners. While applications talk to CAS directly, all communication between Shibboleth and any application goes through the Browser. So if there is a PRE-PROD test environment with a copy of the code we propose to put into production and a copy of the Production credentials, then a Browser on a machine can use it with all the standard production apps by the obvious brute force solution of pointing the hosts file on the Browser client machine to the PRE-PROD VIP whenever the browser is redirected to "auth.yale.edu". The first time it may be necessary to approve the SSL Certificate name mismatch, but after that you have a platform to comprehensively test the exact configuration we intend to put into production.

Configuration Strategies

Attribute-Resolver (Queries and Attributes)

Attribute-Filter

Relying-Party and Metadata

Elements of a Proposed Strategy