Configuration Strategies

Shibboleth is configured with three primary files (attribute-resolver.xml, attribute-filter.xml, and relying-party.xml) and a number of "metadata" files.

The three primary files can be stored:

On the local disk of the Shibboleth server VM
On a Web server of some sort (referenced by a URL).
On a Subversion server

If they are local disk files then Shibboleth can check their modify date and load a new configuration when the file is updated. If they are on a Web server, then HTTP can be used to check to see if the file is changed. The Subversion access is driven by the version number increasing instead of the date.

The Shibboleth documentation recommends SVN, but because Shibboleth is a Tier 0 application it has to be able to come up with as few prerequisites as possible and the SVN support does not seem to work if the Subversion server has not yet been restored.

Yale Shibboleth configuration is driven by Subversion, but Shibboleth is not the SVN client and it doesn't download files itself. Instead, we control when the files become active through the Jenkins Install job. Jenkins checks out configuration files from SVN and then copies the files to local disk on the Shibboleth servers.Shibboleth is configured to use local disk files, and we have an administratively controlled and logged mechanism to update them. Once installed, the files remain available should Shibboleth be restarted in any Disaster Recovery scenario.

We have configured Shibboleth to check every 30 seconds for a new change timestamp on any local configuration file. When it sees a new version of the file it reads the contents into memory and runs a minimal XML parse. If there is an XML syntax error in the file, it is discarded and the old configuration remains active. Otherwise, once the file has been successfully read then the new configuration replaces the previous configuration.

Metadata is a bit more complicated. Metadata sources are configured in the relying-party.xml file. Each Metadata source is an independent configuration with its own refresh rules. At Yale, we have decided to use three Metadata source models:

Static - Production Metadata for partners that have supplied us with Metadata that we check into SVN and manage directly are handled as individual static files. They are copied from SVN to the local hard drive of the Shibboleth server, but they have no refresh policy. You cannot change a Static Metadata file by itself. You have to change the timestamp on the relying-party.xml file, and when it gets read into memory then Shibboleth automatically reloads all the Metadata files that the relying-party.xml file designates.

Remote- The InCommon Metadata is provided from a remote URL on the InCommon Web server. Once every 8 hours Shibboleth checks for a new version and dowloads it from the server. Shibboleth maintains a local disk copy of the last file downloaded, so if Shibboleth is restarted and the remote server is unavailable it will be able to come up with the previous InCommon Metadata.

Dynamic - Specific Metadata files are stored as local files on disk, but they are configured to be examined once every 5 minutes for a changed timestamp and to be reloaded when they change. Because Shibboleth examines Metadata sources in the order in which they are configured, and it stops when it finds Metadata for the entity for which it is searching, these dynamic Metadata files are distinguished by their position in the search order.

The "emergency-override" dynamic file comes first in the search, so any Metadata placed in this file overrides an older version configured statically. This file is initially empty and Metadata is placed in it when we have an incident because an existing partner metadata has failed (typically because it has expired or the Certificate and key used by the partner has changed unexpectedly). This provides a safer form of "emergency" fix because only the one Metadata element is replaced.

The "additions" dynamic file comes last in the search, so it cannot be used to change any existing Metadata for any entity. It can only define new Meatadata for new entities. This becomes a relatively safe Standard Change because anything put into this file cannot adversely affect existing configured services.

A new partner may need more than just Metadata. They may need attributes released to them. Fortunately, Shibboleth allows the function of the attribute-filter.xml file to be broken up into multiple files. Existing parteners are configured in attribute-filter, and an empty file named additional-attribute-filter.xml is initially deployed with every Shibboleth release.

Therefore, if a new partner has to be defined to production and cannot wait for the every-other-Thursday Release cycle, the Metadata for that partner can be placed in the metadata/additions.xml file and the attributes to be released can be put in the additonal-attribute-filter.xml file. The two files are updated together. At a normal Release point, information is moved out of the "additions" files and becomes part of the standard configuration files, and the empty additions files are deployed to start the next cycle.

If a partner requires a new attribute, however, there is no way to define it outside the every other Thursday system (unless the ECAB authorizes an unscheduled Release).

Attribute-Resolver (Queries and Attributes)

Attributes are defined and their values are obtained from the configuration in the attribute-resolver.xml file.

The file starts with DataConnectors. A typical connector identifies a database or LDAP directory as a source, and a query (in SQL or LDAP query language) to present to the source. Currently Shibboleth pulls data from Oracle instances (ACS, IST, IDM, HOP), the IDR SQL Server database, the Public LDAP Directory, and the Windows AD LDAP directory.

There are generally three types of queries that make sense:

A database query can return exactly one row. Then you can think of the row as a user object, and the column names become the properties of the object. All these properties are Strings (Shibboleth does not have a concept of other data types). NOTE: Oracle always returns column names in UPPERCASE and it is a really, really good idea to always capitalize these names in the configuration file whenever they appear. If you fail to use all caps in one of the later configuration elements, it will fail to match and then you get a null instead of a value.
A database query can return more than one row but only one column. Then you have a "multivalued" property.
An LDAP query returns the User object from the directory. LDAP User objects have properties some of which are single valued and some of which are multivalued.

One other point. The database query can return no rows, or it can return a row where some of the columns have the value NULL. An LDAP query can return no object, or it can return an object that does not have the property name you are interested in, or the property can exist but have no value. For the most part this doesn't matter unless you have to reference the value of the property in JavaScript. If you write JavaScript, remember to test for the property being "undefined", or null, or empty, which are distinct conditions with distinct tests.

Returning no rows or objects is a normal response to a query. A query fails if it generates an ORAxxxx SQLException or a NamingException. Typically this happens if the database server or directory is down, but it can also happen if the userid and password you are using to login to the server is no longer valid or if permissions have been revoked or were never granted to that user.

Shibboleth regards any query failure as catastrophic, unless there is a "failover" DataConnector. The Failover can be a different query to a different database, or it can be a Static default value.

A Static DataConnector defines one or more properties and values. It is not necessary to define a default value for every property that you could have obtained from the correct execution of the real query, provided that a null or undefined value is acceptable for the other properties. Typically Static default values are -1, 0, "", "undefined" but that is up to you. Because query failure is catastrophic without a Fallback

Every Query must have a Static Fallback DataConnector. It is not important what default values, if any, are provided.

In the new release, the attribute-resolver file has been reorganized to emphasize the Failover relationship, and as part of the testing of the new release we will verify that Shibboleth survives the loss of access to each data source. However, it becomes an ongoing process to ensure that every time a new query is defined, a static Failover is also created and Shib is tested for that failure.

However, defining new queries or attributes is less common, and typically it is not an emergency. With the care that should be used and the testing that should be done, the normal two week release to production cycle seems appropriate.

After the queries are defined, the same file goes on to define SAML attributes. The previous step obtained a value, but different partners want to use different names for the same thing. Take something as simple as "first name". It isn't actually that simple. In China, the name that comes first is the family name, and the individual given name comes second. It is just in the West that the individual given name comes first. Then different partners want to see this value labeled as "FirstName", "first_name", or "givenName" and when they want the long unique formal identifier it can be "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" or "http://schemas.xmlsoap.org/claims/FirstName" or the old LDAP value "urn:oid:2.5.4.42".

There are only a limited number of possible variable that you can extract from the Yale systems about a given user, but there are an unlimited number of names that people can dream up for E-Mail address or phone number. Fortunately, adding a new label for an existing value is simple and in this part of the file an error adding something new cannot cause Shibboleth to misbehave. Unfortunately, because this is the second section of a single file, and additions to the first section can cause problems if they are not done correctly, there is no quick off the shelf improvement available for the Install process. However, with a bit of Ant programming it might be possible to break the file into separate components and define different levels of testing and approval to change the two different types of configuration elements.

The attribute-filter.xml file has a long list of rules listing the Attributes (defined in the previous section) that are released to each partner. For example

    <afp:AttributeFilterPolicy id="releaseToCommunityForceStaging">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://yalestaging.communityforce.com" />
        <afp:AttributeRule attributeID="givenName"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="sn"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="mail"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
    </afp:AttributeFilterPolicy>

CommunityForce gets the givenName (firstname), sn (surname or family name), and E-Mail address (named just "mail" according to the old LDAP standards). In fact, these are all standard old LDAP attributes which are very popular in academic applications. In contrast

    <afp:AttributeFilterPolicy id="releaseToArcher">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://sso2.archer.rsa.com/adfs/services/trust" />
        <afp:AttributeRule attributeID="scopedNetidAsUPN"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="firstnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="lastnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="emailADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>

Archer gets "http://schemas.xmlsoap.org/claims/FirstName" and so on for lastname and email. These are Microsoft URL style names that are more popular these days with everyone except for the old guard in universities who still remember the LDAP names from previous failed attempts to use them.

It is almost impossible to imagine that any additions (or changes) to this file could plausibly cause a problem. However, for good practice it makes sense to arrange the order of the release elements so that the Tier 0, mission critical, or production stuff comes first, and the brand new or testing junk comes at the end. Then there could be a rule that makes the level of approval and testing depend upon where in the file you make the change. Changes to the stuff at the front are important and require signoff, while adding a new partner to the end is routine and can be done at any time. Again it would be nice to create an Ant script that breaks the sections up into separate files that are assembled at install time, and then the level of risk would be determined by which file representing which section of the configuration you are working with.

Relying-Party and Metadata

The relying-party.xml file is only important now because it defines where Shibboleth finds Metadata. It is unlikely that the file itself will be modified, but if the Ant script triggered by a new form of Jenkins Install job simply "touches" the file (an Ant operation that resets the change date) then Shibboleth notices the new date and it reloads all the Metadata files.

So now it is important to explain Metadata. SAML defines a standard format file that a partner should give us to define the two things we need to know: what is the formal name the partner uses to identify itself and where should we send the SAML message after we create it. Metadata is the most complicated possible format imaginable to carry such little information, but SAML defines a lot of extra fluff in the standard.

A partner can expose metadata with a URL, and we can configure Shibboleth to use the URL to fetch new metadata from the partner periodically, but what happens then if the partner is down when Shibboleth restarts. Fortunately, Shibboleth can be configured (although it is not the default) to not regard a failure fetching any metadata file as a fatal error that prevents initialization. However, it is safer if we make a copy of the metadata and check it into our own system, especially since it almost never changes.

Shibboleth is actually much smarter and more flexible with Metadata than it is with any of its other configuration elements. In the relying-party.xml file you define a sequence of possible metadata sources. Each source is treated as independent and dynamic. Independent means a failure of any source does not affect the validity of the other sources. Dynamic means that any source can be configured to poll a local file or a remote URL for updates and to load new data when it appears and the loading of new data for one source does not affect the other sources.

When Shibboleth needs metadata for a partner, it runs down the list of configured sources in the order in which they were configured checking each source for configuration data for the unique identity string for that partner. When it finds a match, it uses that metadata.

This creates two obvious special sources. One source we can call "the junk at the end of the list" or just the additions. The additions metadata can be used to add new configured partners, but because it comes at the end and will not be searched if the name if found in an earlier search, anything put in the additions cannot change an already configured metadata source. This file is totally safe. It cannot change any existing service. It can only add brand new configurations for new partners. Since mistakes in the file don't affect other configuration, you can change it at any time.

The other extreme is a typically empty file at the start of the list that is the "emergency-override.xml" source. Add anything to this file and it replaces any metadata in any other source. You use it to respond to an emergency when you just need to fix one piece of metadata and you don't care where it came from (InCommon, a local configuration file, whatever). It will be found first and it will fix a reported problem quickly, and then the long term fix can be handled in the normal repair cycle.

This then leaves us with a small number of special cases. Two of our partners (salesforce and cvent) use a technique that we might call the Expanding Metadata File. Every time you define a new application with these systems, instead of getting a new Metadata file you get a one line change to add to the existing Metadata file. In Salesforce, the file looks like:

      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-finance.my.salesforce.com?so=00Di0000000gP9D" index="12"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-fbo.my.salesforce.com?so=00Di0000000gP9D" index="13"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-adm.my.salesforce.com?so=00DA0000000ABT0" index="14"/>

The next time someone comes up with a new Salesforce application, it will be index="15" and will have its own unique Location value.

This means that a new type of targeted Jenkins Install job should treat the Salesforce and Cvent metadata files differently from all the other metadata we are managing. Changes to those two files is routine and requires less approval than changes to archer or hewitt.

Elements of a Proposed Strategy

Currently a "config" run of the Jenkins Install job replaces all the Shibboleth configuration files with new copies checked out from Subversion.

The proposal is to add one or more new soft-config options (to be named later) that perform subsets of the "config" install. Rather than having a large number of new Jenkins options, the soft-config will be driven by the Subversion tag. That is, instead of expecting to copy everything it will expect that only a small subset of the possible files will be updated and tagged and it will only change those files.

It will be easy and completely safe to create the metadata "additions" file that is initially empty and to which new metadata can be added between full Shibboleth release cycles.
It would be useful if some special processing of the Salesforce and Cvent metadata files was provided so these standard changes could be handled routinely even though they modify existing files.
Adding new release policies at the end of the existing attribute-filter.xml file should also be safe and routine.
Adding new Attribute names (for existing unmodified queries) is the last obvious and fairly safe operation.

Then the second element of the strategy is to provide a more accurate and complete testing strategy. Currently TEST Shibboleth is connected to the TEST database instances (ACS2, IST2, IDM2, HOP4) and potentially to the TEST AD (yu.yale.net). This provides a service for those who need to use test netids, but it does not actually test what is going to go into production.

It is also true that most partners do not support TEST environments. In fact, the entire InCommon Federation has no concept of TEST and no provision for us to define our TEST Shibboleth.

However, while CAS is bound to a particular well known URL (secure.its.yale.edu/cas), Shibboleth is actually not bound to a URL or server but rather is known by the Public/Private key pair stored in its /usr/local/shibboleth-idp/credentials folder. Create a second instance of Shibboleth running on any server anywhere in the organization and give it a copy of the same credentials files and it will generate a SAML message that will be accepted as legitimate by any of our partners. While applications talk to CAS directly, all communication between Shibboleth and any application goes through the Browser. So if there is a PRE-PROD test environment with a copy of the code we propose to put into production and a copy of the Production credentials, then a Browser on a machine can use it with all the standard production apps by the obvious brute force solution of pointing the hosts file on the Browser client machine to the PRE-PROD VIP whenever the browser is redirected to "auth.yale.edu". The first time it may be necessary to approve the SSL Certificate name mismatch, but after that you have a platform to comprehensively test the exact configuration we intend to put into production.

Configuration Strategies

Attribute-Resolver (Queries and Attributes)

Attribute-Filter

Relying-Party and Metadata

Elements of a Proposed Strategy