General Shibboleth Configuration

Shibboleth 3 is a Spring Framework application. Spring provides a generic configuration syntax that can be used by almost any application. It is not reasonable here to try and describe Spring configuration syntax in general. Go to the Spring website for that information.

Every real Spring file contains a <beans> element. In that file there are a sequence of either <bean> elements or else aliases for a particular kind of standard bean. For example, a List bean based on a Java list of objects is a <util:list> element which is translated into into a particular type of <bean>. Every bean has an id= name by which it is referenced.

Each <bean> is an instance of a Java class, and it is configured by providing values for named properties and occasionally a set of constructor arguments. When many beans will all be created with many common arguments, then an incomplete (abstract) <bean> definition is provided with the common arguments, and then other <bean> elements complete the configuration by referencing the partially configured bean with parent= and then add the missing properties to complete the configuration.

Shibboleth has to work with standard SAML configuration files, and they are ordinary XML files with no Spring elements. In general, Spring calls a text, properties, or xml file a "Resource" and the resource can be defined by a path to a disk file, or a URL to a file on a Web server. The Shibboleth configuration files that are standard XML files are a Resource that has to be read and understood by some Bean, which is different from a Spring XML file that contains a <beans> element.

In the Shibboleth conf subdirectory, the attribute-filter.xml and attribute-resolver.xml files are ordinary XML resources while the other files (services.xml, relying-party.xml, metadata-providers.xml) are Spring beans files. Of course, internally the elements in the non-Spring XML files are turned into objects, but that is done by Shibboleth processing and not just standard Spring.

Most Shibboleth configuration files can be sourced from local disk, from a Web server, or from a Subversion repository. However, Shibboleth is a "Tier 0" application that has to come up after any type of power or datacenter failure before anything else comes up, so it cannot depend on external servers. We solve this problem with a Jenkins Install job that checks files out from Subversion but then manually deposits them on the Shibboleth server in the conf subdirectory. We manage the configuration files with Source Control, but they have to be manually transferred to local disk under operations control. Once stored they are available locally when the Shib server boots.

The files in the conf directory are defined by Shibboleth and Yale. There is a separate metadata directory that defines our individual Relying Parties (google.com, yale.service-now.com, salesforce.com, etc.). However, in order for a metadata file to be recognized at all, there has to be an entry pointing to it in the metadata-providers.xml file.

The metadata-provides file defines sources of metadata. Each source is read into memory when Shibboleth starts up, and based on the configuration of the element in metadata-providers it can be "polled" in regular time intervals to see if a new copy of the file is available.

The InCommon federation curates submitted Metadata for hundreds of potential partners. We define a metadata provider for InCommon that reads the latest version of InCommon metadata from the URL they provide. This is cached to a local copy of the file on disk so that after a datacenter failure Shibboleth can come up using the cached copy of the InCommon metadata in the event that network connectivity to the InCommon server has not be restored. There is special logic in the Install job to not delete the cached InCommon metadata file when all the other metadata files are being deleted and replaced.

However, InCommon is the only dynamic metadata file, and we only trust InCommon as a dynamically updated source because the contents are curated. For the other metadata, particularly for critical business functions, each metadata file is managed as a local disk file updated when necessary by the Jenkins Install process.

Shibboleth processes a metadata-provider as a unit. If it is a file on disk, then the file is read and a single XML syntax prevents Shibboleth from using anything in the file. A metadata file can contain a single EntityDescriptor or a list of them. For safety, we maintain separate files and therefore separate metadata-provider definitions for each cloud partner or application. That way we know that a change given to us by Community Force cannot interfere with our configuration of Comcast IPTV, because they are in separate files and a syntax error in either is localized to that one file.

Shibboleth starts with the EntityID of the partner. For example, one of the Comcast application entities is entityID="https://university-dev.ccp.xcal.tv:443/openam". Shib then goes to the list of metadata providers in the xml file and searches the in memory list of elements constructed from that metadata provider for a matching
<EntityDescriptor entityID="https://university-dev.ccp.xcal.tv:443/openam" ...>
It stops when it finds the first matching EntityDescriptor in any metadata provider. Therefore, if the same entityID is in more than one metadata provider source, Shibboleth will use the source that comes first in the list of metadata providers. We use this to provide a maintenance strategy based on a few local-file metadata provider elements that are configured to regularly check for file changes:

The "emergency-override" dynamic file comes first in the search. Metadata placed in this file with an EntityID that matches an existing Metadata entry in a later file will logically replace the previous version of production Metadata for any partner. After each new Shibboleth Release, this file is empty. Metadata can be placed in it when we have an incident because an existing partner metadata has failed (typically because it has expired or the Certificate and key used by the partner has changed unexpectedly). This provides a safer form of "emergency" fix because only the one Metadata element is logically replaced instead of the entire Shibboleth configuration.

The "additions" dynamic file comes last in the search. Every existing Metadata file will have already been searched, and all existing EntityID values will have matched, so you do not get to this file unless you have a new EntityID that doesn't match any existing on (including all the InCommon entities). This file can only define new Meatadata for new entities. This becomes a relatively safe Standard Change because anything put into this file cannot adversely affect existing configured services. A new partner may need more than just Metadata. They may need attributes released to them. Fortunately, Shibboleth allows the function of the attribute-filter.xml file to be broken up into multiple files. Existing partners are configured in attribute-filter, and an empty file named "additional-attribute-filter.xml" is deployed with every Shibboleth Release. Therefore, if a new partner has to be defined to production and cannot wait for the every-other-Thursday Release cycle, the Metadata for that partner can be placed in the metadata/additions.xml file and the attributes to be released can be put in the additional-attribute-filter.xml file. A Jenkins install of runtype=additions replaces both of these originally empty files with the data for the newly defined partner while guaranteeing by their search order that they cannot interfere with existing services. When the next regularly scheduled Shibboleth Release is ready, the changes move from the additions files to the normal Shibboleth configuration and the additions files are empty again.

Two of our partners (Salesforce and Cvent) regularly add new AssertionConsumerService URL elements to their existing Metadata file. This happens so frequently that we have the option of replacing these specific production Metadata files with updated copies. There has not yet been any urgency to make such changes outside a normal Release cycle, but we have the ability to respond to the special needs of these two cloud partners if "every other week" becomes an unacceptable delay.

Jenkins Runtype

The runtype parameter in the Jenkins Install job determines the specific processing that this run of the Install job will perform.

Runtype "install" stops the JBoss server, loads a complete Shibboleth system including potentially new code, and new configuration files.

Runtype "config" does not stop JBoss or the running Shibboleth server. Instead, it replaces the full set of configuration files. The running Shibboleth process checks the timestamps on these files, and when it sees they have changed it loads a complete new configuration (although in practice only one or two configuration files will actually have new contents).

The 3/15/2015 update to Shibboleth added three new categories of runtype:

Runtype "additions" modifies only the "additions.xml" Metadata file and the "additional-attribute-filter" file. This can be used to add new Service Providers to production between the every-other-Thursday full Release cycles. Shibboleth isolates these files and appears to guarantee that this type of configuration cannot possibly interfere with existing production services.

Runtype "emergency" will change the "emergency-override.xml" file and allows us to define new Metadata for an existing production partner without affecting anything else. It may require permission from the ECAB, but it is a less dangerous change than a full configuration update. Note that the old Metadata for the partner remains in place, but is not used because the override Metadata is found first in the search order. Before the next Release cycle (the next runtype=install or runtype=config), the old production Metadata should be replaced with the new override data and the emergency-override.xml should be emptied.

Runtype "salesforce" and "cvent" are proposed runtypes that change a single Metadata file for the two partners that require frequent updates.

Contents of the Primary Configuration Files

Attribute-Resolver

Shibboleth normally has a single attribute-resolver.xml file, but Yale changed this.

The attribute-provider file has two types of elements. DataConnectors define database or LDAP queries that produce result sets with columns or LDAP User objects with properties. AttributeDefinitions then embed a value of one column or property returned by one of these queries in a SAML attribute statement with a name, friendlyName, and format.

Normally there is one file with the two different types of elements, but that produces more complicated XML because everything has to be decorated xml namespace prefixes. It just seems cleaner to use the capability provided in Shibboleth 3 to create two files, where attribute-resolver-connectors has the DataConnectors and attribute-resolver-definitions has all the AttributeDefinitions. If you don't like this, you can combine the two back together again and then add back all the xml namespace prefixes that can be defaulted if the files are separate.

DataConnectors

There are generally three types of queries that make sense:

A database query can return exactly one row. Then you can think of the row as a user object, and the column names become the properties of the object.
A database query can return more than one row but only one column. Then you have a "multivalued" property for one user.
An LDAP query returns the User object from the directory. LDAP User objects have properties some of which are single valued and some of which are multivalued.

Each column in the result set or property in the LDAP object becomes an Attribute of sorts, but this attribute has no SAML formatting information and no single identifier. To identify it you must both reference the id= of the DataConnector and the column or property name. To change that, you must use an AttributeDefinition.

Building a working pool of database connections is a complicated process. Shibboleth 2 generally relied on the application server (Tomcat or JBoss) capability to create and manage DataSource objects. Shibboleth 3 can also use datasources provided by the server, but it can also use datasources created by Spring support for databases. At this time we are sticking with Tomcat managed datasources, but we may move to Spring when we have a chance.

AttributeDefinitions

A query creates an attribute object for each database result set column or each property of the LDAP user object returned. These attribute objects are not directly useful because they are attached to the object created to represent the query result. You use them to create derived attribute objects that can stand on their own. To do this you configure an AttributeDefinition element with a sourceAttributeID that names the column/property and a Dependency that references the id of the query.

<resolver:AttributeDefinition id="idrFirstName" xsi:type="ad:Simple" sourceAttributeID="FirstName">
    <resolver:Dependency ref="IDRQuery" />
</resolver:AttributeDefinition>

This element creates a new attribute with an id of "idrFirstName" that is the value of the FirstName column of single row returned by the IDRQuery database query (defined elsewhere in the file). This converts the essentially unnamed attribute generated for that column in the query into an independent attribute with its own name.

In most cases an AttributeDefinition will also have one or more AttributeEncoder elements that tell Shibboleth how to produce SAML representing this attribute when it is sent to a Service Provider. Typically the Encoder element specifies a friendlyName= like "FirstName" or "GivenName" and an (unfriendly) name= like "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" or "urn:oid:2.5.4.42". An attribute that has an Encoder can be released by the attribute-filter.xml file and will then appear the a SAML Response.

However, in a few cases the AttributeDefinition will not have a Encoder. This attribute cannot be released and cannot generate SAML on its own, but it can be used to create a second attribute. This is most useful when several attributes have to be combined to produce a single new attribute. Attributes can be used to generate variables in a JavaScript generated attributes, or they can define variables inserted into a SQL statement in a secondary query, or they can be combined to form a composite text string in a Template attributes. JavaScript seems to be the easiest case to explain.

An AttributeDefinition can have any number of Dependency statements, but only one can reference a DataConnector (that is a Query) because there is only the one sourceAttributeID. That sourceAttributeID selects a column or property in the single query. In a Script definition, the sourceAttributeID also names the JavaScript pre-defined variable passed to the JavaScript that holds the value of this column or property. However, the script may require the value of other columns from the same or other queries, and the only way to get them is to create named (id=name) attributes with another AttributeDefinition. All the other Dependency statements (that don't reference a DataConnector, reference the id of another AttributeDefinition. The JavaScript block gets a variable with that id name and the value of that attribute.

An AttributeDefinition referenced by the Dependency statement in a JavaScript block can also have AttributeEncoder statements and be released as SAML in a Response. It can do both jobs simultaneously. An AttributeDefinition that doesn't have an Encoder can only be referenced by a Dependency statement in the definition of another attribute.

XML and Shibboleth are case sensitive, so it is important to realize that Oracle always converts its columns to UPPERCASE. To avoid errors you should always use UPPERCASE names for the sourceAttributeID field if the query is to an Oracle database, and you should define an UPPERCASE id for a default static value in the fallback connector if the Oracle query fails. Otherwise you may spend hours trying debugging the failure of the value to show up where you expect it to be.

When designing this file, you need to consider certain possibilities:

The database query can return no rows.
The database query can return a SQL NULL value for a column (unless you use NVL in Oracle or ISNULL in SQL Server to replace the NULL with a default value).
An LDAP query can return no User object
An LDAP query can return a User object, but in that object the property you are looking for may not be present.
An LDAP query can return a User object, and the property may be present, but it may have no values in the list of values.

First, a Subject cannot be NULL. Therefore, you cannot define a Subject (a NameID) on any column or property that can be NULL or missing for anyone (student, faculty, staff, alumnus, contractor, ...).

Then you have to write JavaScript code very carefully.

A JavaScript variable is "undefined" if the variable name has never been used and is not in the name table. This can occur indirectly if an LDAP query returns a user object that does not have the property whose name you specified in the sourceAttributeID (although this may vary from version to version of Shibboleth and Java). There is a special way to check JavaScript variables to see if they are undefined and it is not "if (name==null)" because that throws an exception if "name" is undefined.

A JavaScript variable can be null. However, when the JavaScript variable is associated with an attribute then it points to an Attribute object. The object contains an array of values (because any Attribute could be multi-valued). So it is also possible that the array of values is empty. Each of these possibilities is a different test with different syntax, and you have to do all of them before you can just use a JavaScript attribute and assume it has a value.

Because it is easy to screw up, run any block of JavaScript in a try-catch, and if you catch an error then it is a reasonable practice to assume that some variable you are using is missing or has no value and generate the appropriate result based on that assumption.

If any exception is thrown during the query, then the Shibboleth code will attempt to execute a secondary query specified in the "failover" attribute of the DataConnector. The failover can point to a different query to a different database that might return the same value. Or it can be a Static element.

A Static DataConnector defines one or more property names and values. It is not necessary to define a default value for every property that you could have obtained from the correct execution of the real query, provided that a null or undefined value is acceptable for the other properties.Given the previous warning about NULL and undefined and empty, you should think twice before leaving column/property names without an explicit default value (0, 1, -1, "", "undefined", etc.). However, it is not an error to omit them if you choose. The Static DataConnector cannot throw an exception.

Every Query must have a Failover DataConnector, which may itself have a Failover, and the chain must end with a Static Connector.

In vanilla Shibboleth, the only errors that are caught during Attribute evaluation are query errors. Any other error (during JavaScript evaluation, or because of Shibboleth bugs) is fatal and prevents the user from logging on to any partner. Yale has added code to wrap other evaluations with a try-catch that discards attributes in error but preserves all the other attributes. Because attribute errors tend to occur only in new attributes not widely in use or old abandoned attributes no longer in use, this makes Shibboleth more robust against real world errors without impacting security. We will try to interest the Shibboleth maintainers in making this fix standard.

One column/property returned by a query may be used with many different AttributeDefinition statements, each with a different id name and each with a different Encoder configuration. This generates different SAML statements in which the same value is given different names to different partners who expect those names. Some partners like the old LDAP names that are actually a string of numbers. Some like newer Oasis standard names that look like a URL. Some partners simply make up a name and expect us to provide it. The value of firstname in the database column could be used to generate SAML that labels it as "FirstName", "firstName", "first_name", "givenName", "Given Name", and slightly more sophisticated partners will ask for one of three globally unique technical identifiers ("http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname", "http://schemas.xmlsoap.org/claims/FirstName", and "urn:oid:2.5.4.42"). In addition to different technical standards, there is also confusion because in the Far East the family name comes first and the individual name comes last.

It is also possible to define several Attributes with the same SAML Encoder name string, but with different values. For example, the simple attribute "mail" can refer to the Yale primary Email alias or it can refer to the native mailbox name of the primary account, and I suppose for some partners in some cases it might refer to a non-Yale preferred mailbox. Each AttributeDefinition will have a unique id string, and you should be careful not to release to any party two attributes with the same SAML name but different values because that will just confuse them.

The attribute-filter.xml file has a long list of rules listing the Attributes (defined in the previous section) that are released to each partner. For example

    <afp:AttributeFilterPolicy id="releaseToCommunityForceStaging">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://yalestaging.communityforce.com" />
        <afp:AttributeRule attributeID="givenName"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="sn"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="mail"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
    </afp:AttributeFilterPolicy>

CommunityForce gets the givenName (firstname), sn (surname or family name), and E-Mail address (named just "mail" according to the old LDAP standards). In fact, these are all standard old LDAP attributes which are very popular in academic applications. In contrast

    <afp:AttributeFilterPolicy id="releaseToArcher">
        <afp:PolicyRequirementRule xsi:type="basic:AttributeRequesterString" value="https://sso2.archer.rsa.com/adfs/services/trust" />
        <afp:AttributeRule attributeID="scopedNetidAsUPN"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="firstnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="lastnameADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>
        <afp:AttributeRule attributeID="emailADFS"><afp:PermitValueRule xsi:type="basic:ANY" /></afp:AttributeRule>

Archer gets "http://schemas.xmlsoap.org/claims/FirstName" and so on for lastname and email. These are Microsoft URL style names that are more popular these days with everyone except for the old guard in universities who still remember the LDAP names from previous failed attempts to use them.

The Attribute-Filter entries are cumulative. Shibboleth runs through the rules and whenever a rule applies to a entity, any released attributes are added to the list of values we will send. Although most of the time all of the attributes for one entity will be defined in one place, this is a good and sane practice but not a requirement.

Therefore, Shibboleth allows the Attribute-Filter function to be broken up into more than one file. We take advantage of this by creating an attribute-filter.xml file that contains the attributes released to each partner as of an official Shibboleth Release, but then an addtional-attribute-filter.xml file exists initially empty that can be changed between formal releases. The addtional file can either create a new filter policy for a new partner, or it could add an additional attribute to an existing partner.

However, you can only release attributes defined by attribute-resolver.xml, and that does not change between releases.

Relying-Party and Metadata

Metadata is a SAML standard format for describing the Identity Provider (Shibboleth at Yale) and the Service Provider (example: coursera.org). Shibboleth needs Service Provider Metadata for its partners. Although the Metadata file can be quite large and complex, the important information is the EntityID, a unique identifier for the partner, which is typically either a DNS name (coursera.org) or a URL (https://coursera.org). There is also an "AssertionConsumerService URL" that defines the URL to which Shibboleth sends the SAML message that describes the user.

In Shibboleth 2, the relying-party.xml file defines the Metadata sources. In Shibboleth 3, there are separate configuration files for Relying Parties and Metadata Providers.

Each Metadata Provider obtains an XML text file either from disk or from the network. Each XML file contains Metadata for a single entity or for a list of entities.

When Shibboleth needs Metadata information for a specific EntityID, it goes to the first defined Metadata source and looks for that ID. If it finds a match, Shibboleth stops looking. Otherwise it checks the second source and on until it runs out of configured Metadata sources. This "stop at the first match" means that when more than one Metadata Provider has information about an Entity, Shibboleth will use the data from the first configured Provider.

Some partners are configured through a Federation. InCommon, for example, distributes Metadata for a large number of Universities and companies that do business with universities. Periodically Shibboleth obtains updated Metadata from the URL "http://md.incommon.org/InCommon/InCommon-metadata.xml".

Our most important partners exchange Metadata with us directly. We store their Metadata files in a directory in Subversion, and we add a reference to the file name to the relying-party.xml file so Shibboleth will read it. Because we control these local static Metadata files, we put them first before the InCommon dynamic file we do not control.

Shibboleth has a failFastInitialization="false" parameter for each configured Metadata source. The default is "true" and causes Shibboleth to fail to start up if the Metadata is invalid. If we put Metadata directly into production, "true" would be a really, really bad idea. However, at Yale Metadata goes through DEV and TEST before it goes to PROD, and the way the Jenkins jobs interact with the Subversion tags should prevent problems only showing up in production. If we have an issue, it is better that it show up as an initialization problem for DEV and get fixed immediately rather than being something that could just slip through the cracks. Perhaps this parameter should be "true" in DEV and TEST and "false" in PROD, and that will be a change to be made in some later release.

Yale defines four types of Metadata Providers in the following order:

The dynamic "emergency-override.xml" that is initially empty but can be used to replace production that becomes bad between releases.
The static production partner Metadata XML files provided for archer, hewitt, communityforce, salesforce, and so on.
The InCommon remote source which changes without our knowledge or control.
The dynamic "additions.xml" file where new partners can be defined between releases (also associated with the additional-attribute-filter.xml file).

This then leaves us with a small number of special cases. Two of our partners (salesforce and cvent) use a technique that we might call the Expanding Metadata File. Every time you define a new application with these systems, instead of getting a new Metadata file you get a one line change to add to the existing Metadata file. In Salesforce, the file looks like:

      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-finance.my.salesforce.com?so=00Di0000000gP9D" index="12"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-fbo.my.salesforce.com?so=00Di0000000gP9D" index="13"/>
      <md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" 
		Location="https://yale-adm.my.salesforce.com?so=00DA0000000ABT0" index="14"/>

The next time someone comes up with a new Salesforce application, it will be index="15" and will have its own unique Location value.

We may add special types of Jenkins Installs (runtype=salesforce and runtype=cvent) that replace just this one file. The bad news is that if the new Metadata is bad it will break existing Salesforce or Cvent applications, but the type of edit here is fairly simple and any mistakes should show up in DEV and TEST. Futhermore, the Shibboleth isloation of Metadata sources and the decision to configure files separately in relying-party.xml ensure that changes to Salesforce only affects Salesforce applications and nothing else.

Configuration Strategies