General Shibboleth Configuration

Shibboleth is configured with three primary files (attribute-resolver.xml, attribute-filter.xml, and relying-party.xml) and a number of "metadata" files.

The three primary files can be stored:

On the local disk of the Shibboleth server VM
On a Web server of some sort (referenced by a URL).
On a Subversion server

Shibboleth reads the files and creates in memory objects at startup time. After that, it can be configured to check each file periodically and to reload the file and replace that part of the configuration if the file is changed. For local disk files, the last modified date is checked. For HTTP files, an "if-modified-since" check is made. For Subversion, modification is detected when a new version number is checked in.

All three methods of storage work well under normal circumstances, but Shibboleth cannot come up in a disaster recovery situation if the configuration data is not available and options 2 and 3 depend on external servers that may not be up. Although Shibboleth can maintain a local copy of the remote file, it turns out that Shibboleth startup is delayed for an unacceptable amount of time if Subversion is configured but the Subversion server is down.

So Yale controls Shibboleth configuration using Subversion, but not by making Shibboleth a direct SVN client. Instead, the Jenkins Shibboleth Install jobs check configuration files out of Subversion and copy them to local disk files on the Shibboleth server. Shibboleth is configured to use option 1 (local disk files and periodically check the modified date), but the file contents are moved from SVN to the local disk through an explicit activity (running the Jenkins Install job) which in turn is driven by Yale Release Policy and Service Now.

Metadata files defining remote Service Provider partners must be either local files or Web URLs. InCommon is a federation with an administrative process for controlling and validating Metadata updates, so that is the only Metadata source that we allow to come from a Web Server, and then we use the Shibboleth option of keeping a local disk copy of the last downloaded InCommon file so Shibboleth can come up when a network connection to InCommon is unavailable. All other Metadata files are managed by us, stored in SVN, and copied to the Shibboleth server local disk by the Jenkins Install job, just like the three primary configuration files.

Metadata files can define a single partner entity, or a list of entities. The list of Metadata sources is configured in the relying-party.xml primary configuration file, and every time that file changes Shibboleth reads in a new copy of all its Metadata sources. Otherwise, individual sources of Metadata can be configured to be periodically checked for an update and new Metadata is read in to replace the old source when the file timestamp changes.

This produces three strategies that Yale uses for Metadata sources:

Static - Most Yale Metadata files are stored in disk files that are only changed by a full Jenkins configuration install. They are not individually checked for timestamp changes. Instead, when Jenkins Install updates the relying-partly.xml files and changes its timestamp, the new copy of that file is read into memory (even if it didn't really change content) and then all the Metadata files it defines are also refreshed in memory. All these Metadata files refresh together.
Remote - This option is only used for InCommon. Every 8 hours Shibboleth checks a configured URL to see if the Metadata for InCommon has changed. If so, it downloads a new copy of this information from the remote Web Server, stores a local copy of this file on disk where it can be used as a backup if a network connection to InCommon is not available the next time Shibboleth starts up, and then replaces the in-memory old copy of the InCommon Metadata information.
Dynamic - Prior to 3/15/2015 Yale did not have any individual Metadata files that could be updated by themselves. In the new configuration, there are three types of individually replaceable Metadata.. Because Shibboleth examines Metadata sources in the order in which they are configured, and it stops when it finds Metadata for the entity for which it is searching, these dynamic Metadata files are distinguished by their position in the search order.

The three types of Dynamic Metadata sources are:

The "emergency-override" dynamic file comes first in the search. Metadata placed in this file can replace the previous version of production Metadata for any partner. After each new Shibboleth Release, this file is empty. Metadata is placed in it when we have an incident because an existing partner metadata has failed (typically because it has expired or the Certificate and key used by the partner has changed unexpectedly). This provides a safer form of "emergency" fix because only the one Metadata element is logically replaced instead of the entire Shibboleth configuration.

The "additions" dynamic file comes last in the search, so it cannot be used to change any existing Metadata for any entity. It can only define new Meatadata for new entities. This becomes a relatively safe Standard Change because anything put into this file cannot adversely affect existing configured services. A new partner may need more than just Metadata. They may need attributes released to them. Fortunately, Shibboleth allows the function of the attribute-filter.xml file to be broken up into multiple files. Existing partners are configured in attribute-filter, and an empty file named "additional-attribute-filter.xml" is deployed with every Shibboleth Release. Therefore, if a new partner has to be defined to production and cannot wait for the every-other-Thursday Release cycle, the Metadata for that partner can be placed in the metadata/additions.xml file and the attributes to be released can be put in the additional-attribute-filter.xml file. A Jenkins install of runtype=additions replaces both of these originally empty files with the data for the newly defined partner while guaranteeing by their search order that they cannot interfere with existing services. When the next regularly scheduled Shibboleth Release is ready, the changes move from the additions files to the normal Shibboleth configuration and the additions files are empty again.

Two of our partners (Salesforce and Cvent) regularly add new AssertionConsumerService URL elements to their existing Metadata file. This happens so frequently that we have the option of replacing these specific production Metadata files with updated copies. There has not yet been any urgency to make such changes outside a normal Release cycle, but we have the ability to respond to the special needs of these two cloud partners if "every other week" becomes an unacceptable delay.

Jenkins Runtype

The runtype parameter in the Jenkins Install job determines the specific processing that this run of the Install job will perform.

Runtype "install" stops the JBoss server, loads a complete Shibboleth system including potentially new code, and new configuration files.

Runtype "config" does not stop JBoss or the running Shibboleth server. Instead, it replaces the full set of configuration files. The running Shibboleth process checks the timestamps on these files, and when it sees they have changed it loads a complete new configuration (although in practice only one or two configuration files will actually have new contents).

The 3/15/2015 update to Shibboleth added three new categories of runtype:

Runtype "additions" modifies only the "additions.xml" Metadata file and the "additional-attribute-filter" file. This can be used to add new Service Providers to production between the every-other-Thursday full Release cycles. Shibboleth isolates these files and appears to guarantee that this type of configuration cannot possibly interfere with existing production services.

Runtype "emergency" will change the "emergency-override.xml" file and allows us to define new Metadata for an existing production partner without affecting anything else. It may require permission from the ECAB, but it is a less dangerous change than a full configuration update. Note that the old Metadata for the partner remains in place, but is not used because the override Metadata is found first in the search order. Before the next Release cycle (the next runtype=install or runtype=config), the old production Metadata should be replaced with the new override data and the emergency-override.xml should be emptied.

Runtype "salesforce" and "cvent" are proposed runtypes that change a single Metadata file for the two partners that require frequent updates.

Contents of the Primary Configuration Files

Attribute-Resolver (Queries and Attributes)

Attributes are defined and their values are obtained from the configuration in the attribute-resolver.xml file.

The file starts with DataConnectors. A typical connector identifies a database or LDAP directory as a source, and a query (in SQL or LDAP query language) to present to the source. Currently Shibboleth pulls data from Oracle instances, the IDR SQL Server database, and the Windows AD LDAP directory.

There are generally three types of queries that make sense:

A database query can return exactly one row. Then you can think of the row as a user object, and the column names become the properties of the object.
A database query can return more than one row but only one column. Then you have a "multivalued" property for one user.
An LDAP query returns the User object from the directory. LDAP User objects have properties some of which are single valued and some of which are multivalued.

A query creates an attribute object for each database result set column or each property of the LDAP user object returned. These attribute objects are not directly useful because they are attached to the object created to represent the query result. You use them to create derived attribute objects that can stand on their own. To do this you configure an AttributeDefinition element with a sourceAttributeID that names the column/property and a Dependency that references the id of the query.

<resolver:AttributeDefinition id="idrFirstName" xsi:type="ad:Simple" sourceAttributeID="FirstName">
    <resolver:Dependency ref="IDRQuery" />
</resolver:AttributeDefinition>

This element creates a new attribute with an id of "idrFirstName" that is the value of the FirstName column of single row returned by the IDRQuery database query (defined elsewhere in the file). This converts the essentially unnamed attribute generated for that column in the query into an independent attribute with its own name.

In most cases an AttributeDefinition will also have one or more AttributeEncoder elements that tell Shibboleth how to produce SAML representing this attribute when it is sent to a Service Provider. Typically the Encoder element specifies a friendlyName= like "FirstName" or "GivenName" and an (unfriendly) name= like "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" or "urn:oid:2.5.4.42". An attribute that has an Encoder can be released by the attribute-filter.xml file and will then appear the a SAML Response.

However, in a few cases the AttributeDefinition will not have a Encoder. This attribute cannot be released and cannot generate SAML on its own, but it can be used to create a second attribute. This is most useful when several attributes have to be combined to produce a single new attribute. Attributes can be used to generate variables in a JavaScript generated attributes, or they can define variables inserted into a SQL statement in a secondary query, or they can be combined to form a composite text string in a Template attributes. JavaScript seems to be the easiest case to explain.

appear just as in the box above. Without an Encoder it cannot be used to generate SAML, but it serves a different purpose to fix a bug in the Shibboleth configuration schema.

Previously defined attributes can be used to define secondary attributes. This can occur when you define an Attribute with a block of JavaScript code, or if you need to plug the attribute into a subsequent query or insert it into a string.

So here is the problem. An AttributeDefinition can have any number of Dependency statements, but only one can meaningfully reference a DataConnector (that is a Query). For that one Dependency, the sourceAttributeID on the AttributeDefinition selects one column or property and makes that variable available as a pre-defined JavaScript variable with the sourceAttributeID= name. Every other Dependency has to reference an AttributeDefinition instead of a DataConnector. Each of those other Dependencies creates a pre-defined JavaScript variable with the ref= name on the Dependency statement (which is the id= of the referenced AttributeDefinition. If an AttributeDefinition had two different DataConnectors referenced by two Dependency elements, then Shibboleth would not know which Query contained the sourceAttributeID column, and if both queries had the sourceAttributeID as a column name it could not create two JavaScript variables with the same name. So using a separate AttributeDefinition like the one in the box above creates an unambiguous statement. A Dependency with a ref="idrFirstName" creates a JavaScript variable named "idrFirstName" that has the value of the "FirstName" column returned by the result set of the "IDRQuery" DataConnector. Any other first names from any other queries can be similarly give unique identifiers.

Note: Oracle queries return column names that are all UPPERCASE. It is a best practice to use uppercase names in the query and in all subsequent references to the column. If you specify an Oracle column name in lower or mixed case in subsequent XML, then the Java code will fail to match the UPPERCASE name in the Oracle result set and a null or missing value will be returned.

When designing this file, you need to consider certain possibilities:

The database query can return no rows.
The database query can return a SQL NULL value for a column (unless you use NVL in Oracle or ISNULL in SQL Server to replace the NULL with a default value).
An LDAP query can return no User object
An LDAP query can return a User object, but in that object the property you are looking for may not be present.
An LDAP query can return a User object, and the property may be present, but it may have no values in the list of values.

First, any value that might be used as a SAML Subject for any Yale person in any login cannot be missing or null for any person at Yale. That makes sense for SAML, but Yale has a few partners who request a particular value be presented as Subject. If the partner only provides data for Employees and it makes no sense for a Student to ever try to login to that partner, you might imagine that this is OK. However, Shibboleth will generate a null pointer exception if any query return NULL for any attribute marked as a possible Subject for any login even if it isn't going to be the Subject in this particular login. So use NVL or ISNULL to replace possibly NULL values with 0 or -1 or "undefined" in such cases.

Then in JavaScript, variables generated from queries with no rows, or no objects, or no property, or NULL can end up being a "undefined" JavaScript variable (a variable that has never been declared with any value) or a defined variable with a value of JavaScript null, or a variable whose value is an object that has no values in its collection of values. Generally speaking, you should always write your JavaScript code to check for all possible bad results, because if you try to test if a variable has the value null when it is really "undefined" you are going to throw an exception. See the Yale code for examples of appropriately paranoid variable testing.

Returning no rows or objects is a normal response to a query. A query fails if it generates an ORAxxxx SQLException or a NamingException. Typically this happens if the database server or directory is down, but it can also happen if the userid and password you are using to login to the server is no longer valid or if permissions have been revoked or were never granted to that user.

The Shibboleth code that executes a query runs in a Java "try-catch" clause. If any exception is thrown during the query, then the Shibboleth code will attempt to execute a secondary query specified in the "failover" attribute of the DataConnector. The failover can point to a different query to a different database that might return the same value. Or it can be a Static element.

A Static DataConnector defines one or more property names and values. It is not necessary to define a default value for every property that you could have obtained from the correct execution of the real query, provided that a null or undefined value is acceptable for the other properties.Given the previous warning about NULL and undefined and empty, you should think twice before leaving column/property names without an explicit default value (0, 1, -1, "", "undefined", etc.). However, it is not an error to omit them if you choose. The Static DataConnector cannot throw an exception.

Every Query must have a Failover DataConnector, which may itself have a Failover, and the chain must end with a Static Connector.

Yale reorganized its attribute-resolver.xml file to emphasize the importance of a Static Failover for every query.

In vanilla Shibboleth, the only errors that are caught during Attribute evaluation are query errors. Any other error (during JavaScript evaluation, or because of Shibboleth bugs) is fatal and prevents the user from logging on to any partner. Yale has added code to wrap other evaluations with a try-catch that discards attributes in error but preserves all the other attributes. Because attribute errors tend to occur only in new attributes not widely in use or old abandoned attributes no longer in use, this makes Shibboleth more robust against real world errors without impacting security. We will try to interest the Shibboleth maintainers in making this fix standard.

Once you have the data from queries, the second step is to format them as Attributes. An attribute contains the value, but it is accompanied by some names and types.

Different partners have decided to demand that the same piece of information be given different names when sent to them. Take something as simple as "first name". In the West, the last name is the family name, but in China the first name is the family name. So international standards prefer not to base the attribute name on its position. Of course, many of our partners only service the US. So for "Howard Gilbert", the "Howard" value will be assigned to many Attributes with names such as "FirstName", "firstName", "first_name", "givenName", "Given Name", and slightly more sophisticated partners will ask for one of three globally unique technical identifiers ("http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname", "http://schemas.xmlsoap.org/claims/FirstName", and "urn:oid:2.5.4.42").

Multiple Attributes with different names can all refer back to the same query and the same database column or LDAP property name. Each Attribute is assigned a unique ID value, and the appropriately named Attribute is then released to each partner in the attribute-filter file.

It is also possible to define several Attributes with the same name but different values. For example, the attribute with name "mail" (the preferred E-Mail address) can be an alias or a direct mailbox address (that is, it can be howard.gilbert@yale.edu or howard.gilbert.eliapps@bulldogs.yale.edu). Which mail address to send depends on whether you are logging into Box or Coursera. So alternative Attributes with name "mail" are defined and given values, and then one of them can be released to each partner using the attribute-filter file.

The attribute-filter.xml file has a long list of rules listing the Attributes (defined in the previous section) that are released to each partner. For example

    <afp:AttributeFilterPolicy id="releaseToCommunityForceStaging">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://yalestaging.communityforce.com" />
        <afp:AttributeRule attributeID="givenName"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="sn"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="mail"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
    </afp:AttributeFilterPolicy>

CommunityForce gets the givenName (firstname), sn (surname or family name), and E-Mail address (named just "mail" according to the old LDAP standards). In fact, these are all standard old LDAP attributes which are very popular in academic applications. In contrast

    <afp:AttributeFilterPolicy id="releaseToArcher">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://sso2.archer.rsa.com/adfs/services/trust" />
        <afp:AttributeRule attributeID="scopedNetidAsUPN"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="firstnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="lastnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="emailADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>

Archer gets "http://schemas.xmlsoap.org/claims/FirstName" and so on for lastname and email. These are Microsoft URL style names that are more popular these days with everyone except for the old guard in universities who still remember the LDAP names from previous failed attempts to use them.

The Attribute-Filter entries are cumulative. Shibboleth runs through the rules and whenever a rule applies to a entity, any released attributes are added to the list of values we will send. Although most of the time all of the attributes for one entity will be defined in one place, this is a good and sane practice but not a requirement.

Therefore, Shibboleth allows the Attribute-Filter function to be broken up into more than one file. We take advantage of this by creating an attribute-filter.xml file that contains the attributes released to each partner as of an official Shibboleth Release, but then an addtional-attribute-filter.xml file exists initially empty that can be changed between formal releases. The addtional file can either create a new filter policy for a new partner, or it could add an additional attribute to an existing partner.

However, you can only release attributes defined by attribute-resolver.xml, and that does not change between releases.

Relying-Party and Metadata

Metadata is a SAML standard format for describing the Identity Provider (Shibboleth at Yale) and the Service Provider (example: coursera.org). Shibboleth needs Service Provider Metadata for its partners. Although the Metadata file can be quite large and complex, the important information is the EntityID, a unique identifier for the partner, which is typically either a DNS name (coursera.org) or a URL (https://coursera.org). There is also an "AssertionConsumerService URL" that defines the URL to which Shibboleth sends the SAML message that describes the user.

In Shibboleth 2, the relying-party.xml file defines the Metadata sources. In Shibboleth 3, there are separate configuration files for Relying Parties and Metadata Providers.

Each Metadata Provider obtains an XML text file either from disk or from the network. Each XML file contains Metadata for a single entity or for a list of entities.

When Shibboleth needs Metadata information for a specific EntityID, it goes to the first defined Metadata source and looks for that ID. If it finds a match, Shibboleth stops looking. Otherwise it checks the second source and on until it runs out of configured Metadata sources. This "stop at the first match" means that when more than one Metadata Provider has information about an Entity, Shibboleth will use the data from the first configured Provider.

Some partners are configured through a Federation. InCommon, for example, distributes Metadata for a large number of Universities and companies that do business with universities. Periodically Shibboleth obtains updated Metadata from the URL "http://md.incommon.org/InCommon/InCommon-metadata.xml".

Our most important partners exchange Metadata with us directly. We store their Metadata files in a directory in Subversion, and we add a reference to the file name to the relying-party.xml file so Shibboleth will read it. Because we control these local static Metadata files, we put them first before the InCommon dynamic file we do not control.

Shibboleth has a failFastInitialization="false" parameter for each configured Metadata source. The default is "true" and causes Shibboleth to fail to start up if the Metadata is invalid. If we put Metadata directly into production, "true" would be a really, really bad idea. However, at Yale Metadata goes through DEV and TEST before it goes to PROD, and the way the Jenkins jobs interact with the Subversion tags should prevent problems only showing up in production. If we have an issue, it is better that it show up as an initialization problem for DEV and get fixed immediately rather than being something that could just slip through the cracks. Perhaps this parameter should be "true" in DEV and TEST and "false" in PROD, and that will be a change to be made in some later release.

Yale defines four types of Metadata Providers in the following order:

The dynamic "emergency-override.xml" that is initially empty but can be used to replace production that becomes bad between releases.
The static production partner Metadata XML files provided for archer, hewitt, communityforce, salesforce, and so on.
The InCommon remote source which changes without our knowledge or control.
The dynamic "additions.xml" file where new partners can be defined between releases (also associated with the additional-attribute-filter.xml file).

This then leaves us with a small number of special cases. Two of our partners (salesforce and cvent) use a technique that we might call the Expanding Metadata File. Every time you define a new application with these systems, instead of getting a new Metadata file you get a one line change to add to the existing Metadata file. In Salesforce, the file looks like:

      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-finance.my.salesforce.com?so=00Di0000000gP9D" index="12"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-fbo.my.salesforce.com?so=00Di0000000gP9D" index="13"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-adm.my.salesforce.com?so=00DA0000000ABT0" index="14"/>

The next time someone comes up with a new Salesforce application, it will be index="15" and will have its own unique Location value.

We may add special types of Jenkins Installs (runtype=salesforce and runtype=cvent) that replace just this one file. The bad news is that if the new Metadata is bad it will break existing Salesforce or Cvent applications, but the type of edit here is fairly simple and any mistakes should show up in DEV and TEST. Futhermore, the Shibboleth isloation of Metadata sources and the decision to configure files separately in relying-party.xml ensure that changes to Salesforce only affects Salesforce applications and nothing else.

Configuration Strategies

General Shibboleth Configuration

Jenkins Runtype

Contents of the Primary Configuration Files

Attribute-Resolver (Queries and Attributes)

Attribute-Filter

Relying-Party and Metadata