Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Spring Framework

Every application has to be configured with information about the Yale environment (server names, how to access databases, Active Directory, how users login) and to select options. You may also plug in external components either by using the features of the application server (filters, listeners, EJBs) or some interface provided by the specific software package. If there is no explicit configuration language, then you may have to get the source and modify it.

...

You know you have a Spring configuration file if the first element is <beans>. Then the file contains mostly <bean> elements, although Spring has a few aliases for <bean> when you are dealing with standard classes. If you are creating a Java List, for example, then instead of a <bean> file that references the "java.util.List" class, it can use the defined nickname of <util:list>.

Local Disk File Configuration

Many applications are configured in a database. Spring has the built in capability to define a file (which it calls a Resource) based on a path to local disk or a URL to a network file. Even before it used Spring, Shibboleth had written its own custom code to read configuration files from disk, from URLs, or even to check a file out from Subversion source control at system startup. Then periodically Shibboleth can "poll" the source to see if a new version of the same file has become available (based on the last changed date or in Subversion the most recent committed version number) and reload it if there was a change.

...

So Yale accomplishes effectively the same thing with a bit more work. All Shibboleth configuration files are checked into Source Control. However, Shibboleth does not know this and does not go to Source Control itself. Shibboleth is configured to use files on disk, and when appropriate to check periodically to see if the file change dates have been modified and reload the changed files. The files are deposited or updated on the Shibboleth local disk by a Jenkins Install job under the control of Operations. So Shibboleth does not see the files change just because a new version of a configuration file has been committed to Subversion. After the commit there has to be an explicit Jenkins run to move the file to the Shibboleth server, and while Jenkins jobs can be configured to run automatically after a commit, this particular job is started by a person when we make a positive decision to change the running Shibboleth.

Terms

Shibboleth is an Identity Provider (an IdP).

...

So a given partner like google.com is sometimes called an SP and sometimes called an RP. It Technically Shibboleth is an Entity, but normally our own EntityID is understood, so most of the time when we discuss Entities and it has Metadata. In different contexts all these terms would be used to describe the same partnerEntityIDs we are talking about an RP/SP. Similarly we have our own Metadata, but it is understood and so most discussions of Metadata refer to an RP/SP/Entity.

Metadata and "Providers"

The conf/metadata-providers.xml file (Shib 2 format, not Spring) contains a list of <MetadataProvider) elements. Each Provider defines a local disk file or URL that contains or returns Metadata. Each file can define a single Entity or it can contain thousands of EntityDescriptors.

There is no requirement, but it is a Yale convention that each <MetadataProvider> element in our configuration points to the location of a file in the "metadata" subdirectory of the Shibboleth directory. Every one of these files is checked out of Source Control and is deposited on the Shibboleth local disk by the Jenkins Install job, except for InCommon.

The InCommon Federation provides a curated collection of thousands of Metadata elements. Shibboleth loads it from the URL supplied by InCommon when it starts up and then checks for updates every 8 hours. Shibboleth keeps the most recent copy of the data from InCommon in a file in the meatadata subdirectory, but that one file is downloaded from a URL and managed by Shib itself and does not come from Jenkins or Source Control.

Yale decides individually to combine some EntityDescriptors for a single application vendor into a single file, but generally we maintain different files in Source Control for unrelated Metadata. Therefore, we will generally create a new <MetadataProvider> element in the metadata-providers.xml file every time we add a new Metadata file to the Source Control directory. Remember that if there is any syntax error, a missing " or a missing / in the XML can kill the entire file. So putting a large amount of unrelated metadata in a single source file seems too dangerous, and the inconvenience of adding an additional Provider element for each new file is worth the safety of isolating the content of each file.

We define a number of metadata providers that initially point to empty files on disk. We take advantage of Shibboleth's ability to reload local disk files when they change, and the convention that metadata is taken from the first file that defines a particular entityid. We use this to address a problem with the unreasonably change-adverse IT administration at Yale.

The "emergency-override" dynamic file comes first in the search. Metadata placed in this file with an EntityID that matches an existing Metadata entry in a later file will logically replace the previous version of production Metadata for any partner. When we have Metadata describing an RP/SP/Entity is the content of a rather large block of XML contained in an <EntityDescriptor> element. This is the sort of thing that any other application would store in a database. Shibboleth reads Metadata from a file or URL and uses it to build objects in memory.

Metadata is obtained from Metadata Providers defined in the conf/metadata-providers.xml file. At Yale, each Provider is a file on disk, but the InCommon metadata for thousands of Entities comes from a URL to the InCommon server that is checked at startup and then once every 8 hours. The most recent copy of the InCommon data is stored on disk, so if Shibboleth starts up at a time when it cannot reach InCommon on the net, it uses the stored file as a backup.

At Yale, Metadata files are checked into Source Control as part of the Jenkins Install project. They are copied to Shibboleth server local disk during Jenkins Install processing and are replaced only by another Jenkins Install (except for InCommon which is the only file that comes dynmically from the Web).

In theory we could create one big file and put all the local Metadata elements in it, but Shibboleth will refuse to read in any file that contains a single syntax error. So instead we tend to use individual files for each SP/RP/Entity although occasionally we will put the DEV/TEST/PROD entities of an application in the same file. That way a screwup is isolated to just the one file and one Entity.

The user wants to login to an RP with EntityID "https://example.com/provider". Shibboleth goes to the EntityProviders (the files) configured and looks for a matching EntityID in the first file, then the second, and so on until a match is made. Since Shibboleth stops when an EntityID is found, the order that the files are defined in the metadata-providers.xml file determines which of two or more metadata elements will be used.

In general, Yale puts all of its own locally managed Metadata files first, then it searches the InCommon Metadata we don't control. That way if we need some special processing for an InCommon partner, we can extract their standard Metadata, change it, and then store it in a Yale Source Control file. This "first match" rule also suggests an obvious use for one initially empty file at the beginning of the search order and one at the end of the search.

The "emergency-override" dynamic file is searched first. Metadata placed in this file with an EntityID that matches an existing Metadata entry in a later file will logically replace the previous version of production Metadata for any partner. When we have a regularly scheduled formal Release of new Shibboleth configuration (on alternate Thursdays) this file is empty. During the two week period, or when it is too later to schedule a regular update through the CAB committee, a runtype=emergency Jenkins Install of Shibboleth modifies just this one file. So if one partner has a problem (typically because a key/certificate changed and we did not know about it in advance) we can go to the Emergency CAB and get approval to put the updated metadata in the emergency-override file, change just that one file on the disk of the running Shib, and fix the problem with that one metadata file. In the next alternate-Thursday full release the changed metadata will be in its normal file and this file will be empty again.

The "additions" dynamic file comes last in the search. Every existing Metadata file will have already been searched, and all existing EntityID values will have matched, so you do not get to this file unless you have a new EntityID that doesn't match any existing one (including all the InCommon entities). This file can only define new Metadata for new entities. This becomes a relatively safe Standard Change that doesn't ahve have to be approved because anything put into this file cannot adversely affect existing configured services. Of course, a new partner may also need attributes released to them. Fortunately, Shibboleth allows the function of the attribute-filter.xml file to be broken up into multiple files. Existing partners are configured in attribute-filter, and an empty file named "additional-attribute-filter.xml" is deployed with every Shibboleth Release. Therefore, if a new partner has to be defined to production and cannot wait for the every-other-Thursday Release cycle, the Metadata for that partner can be placed in the metadata/additions.xml file and the attributes to be released can be put in the additional-attribute-filter.xml file. A Jenkins install of runtype=additions replaces both of these originally empty files with the data for the newly defined partner while guaranteeing by their search order that they cannot interfere with existing services. When the next regularly scheduled Shibboleth Release is ready, the changes move from the additions files to the normal Shibboleth configuration and the additions files are empty again.

Two of our partners (Salesforce and Cvent) regularly add new AssertionConsumerService URL elements to their existing Metadata file. This happens so frequently that we have the option of replacing these specific production Metadata files with updated copies. There has not yet been any urgency to make such changes outside a normal Release cycle, but we have the ability to respond to the special needs of these two cloud partners if "every other week" becomes an unacceptable delay.

Jenkins Runtype

The runtype parameter in the Jenkins Install job determines the specific processing that this run of the Install job will perform.

...

Runtype "salesforce" and "cvent" are proposed runtypes that change a single Metadata file for the two partners that require frequent updates.

Contents of the Primary Configuration Files

Attribute-Resolver

Normally Shibboleth has a single attribute-resolver.xml file that contains two types of elements. DataConnectors define database or LDAP queries that produce result sets with columns or LDAP User objects with properties. AttributeDefinitions then take the columns and properties returned by the queries, assign a unique identifier that can be referenced in the attribute-filter (release policy), and supply SAML syntax. So two DataConnectors could query the Yale IDR database for basic identity information, and also the Active Directory for the subset of identity information it contains. Then AttributeDefinition statements can take the "FirstName" column from IDR or the "givenName" property from AD and create various SAML Attributes all with the same value of "Howard" but with SAML name and friendlyName attributes that refer to it as "FirstName", "First Name", "givenName", "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname", or "urn:oid:2.5.4.42" (informal and formal standards based SAML names for the same thing).

...

Shibboleth documentation is not particularly clear on the algorithm, so I will try to fill in something that I believe is important to understand.

DataConnectors

There are generally three types of queries that make sense:

...

The DataConnector needs a Java DataSource to provide a pool of database connections. Java DataSource management is complicated because it has to know when a database connection must be discarded because it has timed out or because the database rebooted since it was last used. Shibboleth would prefer to leave this to the database experts. Shibboleth 2 did this by using "container managed" connections provided by Tomcat or JBoss. Shibboleth 3 can still use connections managed by the application server, but now that it is a full Spring Framework application it can use DataSources provided and managed by Spring. Either way the complex database management doesn't have to be done by Shib provided code.

AttributeDefinitions

The DataConnector provides the value. We know that "Howard" is the value of the "FirstName" column of the result set returned by the "IDRQuery" database connector or of the "givenName" property of the user object returned by the "ADQuery" LDAP connector.

...

XML and Shibboleth are case sensitive, so it is important to realize that Oracle always converts its columns to UPPERCASE. To avoid errors you should always use UPPERCASE names for the sourceAttributeID field if the query is to an Oracle database, and you should define an UPPERCASE id for a default static value in the fallback connector if the Oracle query fails. Otherwise you may spend hours trying debugging the failure of the value to show up where you expect it to be.

Undefined, Null, or Empty

When you query a database or an LDAP directory and then try to define an Attribute based on the value of a column or property, several things can go wrong:

...

Code Block
                if (typeof googleEmail!="undefined" && 
                    googleEmail!=null &&
                    googleEmail.getValues().size()>0) {
                        googlemailalias = googleEmail.getValues().get(0);
                }

Suppose the Database is Down

If any exception is thrown during the query, then the Shibboleth code will attempt to execute a secondary query specified in the "failover" attribute of the DataConnector. The failover can point to a different query to a different database that might return the same value. Or it can be a Static element.

...

See the examples in the attribute-resolver-connectors.xml file.

NameId (Subject)

Every SAML Response has a Subject field. It has a value and selects one of a list of standardized Format strings.

The value can be the Netid, UPI, Netid@yale.edu, but in most cases it is a reproducible but opaque hash of the Netid or a large random string.

No two users of the same service should get the same Subject value. However, if two individuals lack credentials to actually login to a service, then it is not a problem if two different Responses that the service will reject happen to have the same Subject. Thus if a service is only used by employees, and non-employee students cannot login to it, it is not a problem if all students are given the same dummy Subject value.

Any attribute that might be used to generate the Subject value cannot be NULL. If you have to generate a Subject for some Relying Party that has a value derived from an identity variable that might be null for any person at Yale, then generate a derived attribute with an AttributeDefinition that guarantees it is never NULL even when the input variable is NULL.

In Shibboleth 2 a Subject was represented by a special type of SAML Encoder on particular attributes. In Shib 3 you generally derive special attributes with guaranteed non-NULL values that have no Encoder elements at all, then generate the Subject using an entirely new configuration file named saml-nameid.xml.

The Subject is just "the Subject". It doesn't have a name that indicates what type of value it was generated from. All the documentation suggests that it should be based on a number like Yale UPI, and if we had it to do over again that might be what we use. However, up to this point Subjects are typically generated from Netid. Since you have to have a Netid to login to CAS and Shib, this is guaranteed not to be NULL.

Each subject value generated by the saml-nameid.xml file has an associated format string and is based on a AttributeDefinition.

If the ID of the AttributeDefinition is not released to the Service Provider to which you are trying to login, then all Subject definitions associated with that AttributeDefinition are not calculated and are not eligible for use in this Response.

If the Metadata for the Service Provider to which you are trying to login has a list of NameIDFormat string values, and the Format string associated with a Subject definition is not in the list, then that Subject is not generated an cannot appear in the Response.

When more than one Subject definition can be released to a Service Provider, Shibboleth chooses one. You can control the preference, but now you are missing the point. Either you should not release two Subject-generating AttributeDefinitions to the same EntityID, or you should delete the unwanted NameIDFormat string in the Metadata. If that is not possible, read the Shibboleth Wiki for information on controlling the selection preference.

Other Errors

...

Script bugs

Any JavaScript program can have errors. Usually they only show up when a database is down or some crud gets dumped into new rows or columns, or the AD gets updated badly. Unfortunately, if JavaScript throws an unhandled exception then Shibboleth fails the entire login.

Every Script must be wrapped in a try-catch that catches all errors and does something reasonable. Normally the reasonable thing is to just return which produces an empty Attribute which is probably the best you could do anyway.

Other Errors

Other problems occur inside Shibboleth itself. Unfortunately, if Shibboleth generates an internal exception evaluating any Attribute it aborts login processing and returns no attributes at all. This is not the best solution for Yale, and in Shibboleth 2 we added a try-catch so that exceptions evaluating an Attribute only left that one Attribute undefined. We have not yet decided to migrate that Yale modification to Shibboleth 3.

NameId (Subject)

Every SAML Response has a Subject field. It has a value and one of a list of standardized "Format" name strings.

For most partners the Subject field is ignored and they get any information they need from the Attributes. Some important partners, however, use the Subject field as their most important source of information. ServiceNow expect the Subject field to be a Netid, while Google expects it to be the Eliapps ID (the part of the Email alias before the "@").

The Subject is supposed to be unique and is commonly obscured. For example, if you hash the Netid and some other secret stuff you can get a value that is reproducible from login to login but does not expose the identity of the user. However, if a user is not expected to be able to login to a service then providing both of them the same "do not log this person on" Subject value is not a problem. This means that technically we do not have to worry if the same subject is generated whenever an indispensable identity value is NULL (say when people who do not have Eliapps accounts try to login to Eliapps, or when people who are not Employees try to access the Benefits system).

Shibboleth has been known to generate an internal error if any attribute used to generate a Subject value has a NULL value, so generally any query for a value that might be used as a Subject should substitute a dummy value like "unknown" or "-1" for NULL return values.

In Shibboleth 2 a Subject was represented by a special type of Encoder element in AttributeDefinition statements in the attribute-resolver XML file.

In Shibboleth 3 there is a new subsystem and a new configuration file called saml-nameid.xml. To understand the change, you have to remember that the "best practice" is to generate some obscured meaningless reproducible string of characters as the Subject and to use Attributes exclusively as the source of meaningful information. Shibboleth 3 is designed to emphasize the idea that putting real data in the Subject field is a bad idea. We have to do it because certain partners expect it. We do not have to discuss the configuration of a hash-trash Subject because it is automatic and fairly uninteresting.

If the Subject is meaningful, then it has to be based on some attribute (Netid, UPI, email, ...). That means that there is an AttributeDefinition that provides the value.

In Shibboleth 3 the correct way to do this is to create a special AttributeDefinition with a special ID that is only used to generate subject values:

Code Block
    <AttributeDefinition id="subjectMail" xsi:type="ad:Simple" 
        sourceAttributeID="EmailAddress">
        <Dependency ref="IDRQuery" />
    </AttributeDefinition>

There are lots of real Attributes based on Email address, but this one special attribute is named "subjectMail" and it has no SAML Encoder elements that can be used to produce an Attribute in the Response. With this Definition, we have a special ID and can release "subjectMail" to certain Relying Party partners through the attribute-filter statements.

Now in the saml-nameid.xml file there can be a statement:

Code Block
        <bean parent="shibboleth.SAML2AttributeSourcedGenerator"
            p:format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress"
            p:attributeSourceIds="#{ {'subjectMail'} }" />

This statement creates a potential Subject. It has a format string, but we usually refer to Subject formats by just the last piece after the last colon, so this is "emailAddress". The value will be taken from the 'subjectMail' attribute above.

The last piece of the puzzle is provided by the Metadata for a Relying Party, where one or more Format strings can be provided in a NameIDFormat element:

Code Block
        <NameIDFormat>urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress</NameIDFormat>

Now, if you have done this right, there should be for any given Relying Party exactly one Subject generated by saml-nameid.xml that is both based on an attribute released to this RP by attribute-filter and also has an associated format mentioned in a NameIDFormat element of the RP Metadata. Then that one Subject will be used to generate the Response.

The most important thing to understand about Subjects is that the NameIDFormat does not really mean what it says. In normal use, "emailAddress" seems to suggest that this is a real Email address to which you might send mail. NameIDFormat is a suggestion about what the thing looks like, not a request for a particular real attribute value. In this case it suggests that the Subject "look like an Email address" not that it actually be an email address. In reality, there is not even a requirement that the Subject contain an "@" to match the format.

So in practice you can almost ignore the NameIDFormat except for its use in selecting a specific Subject from the list of available subjects in the saml-nameid.xml file. A value equal to the Netid could, for example, could be assigned any format in saml-nameid and then could match any NameIDFormat in a Metadata file.

When more than one Subject definition can be released to a Service Provider, Shibboleth chooses one. You can control the preference, but now you are missing the point. Either you should not release two Subject-generating AttributeDefinitions to the same EntityID, or you should delete the unwanted NameIDFormat string in the Metadata. If that is not possible, read the Shibboleth Wiki for information on controlling the selection preference.

Attribute-Filter

The attribute-filter.xml file has a long list of rules listing the Attributes (defined in the previous section) that are released to each partner. For example

...

However, you can only release attributes defined by attribute-resolver.xml, and that does not change between releases.

Relying-Party

The relying-party.xml file has three types of definitions:

...

Specific Relying Party configurations could force encryption if we needed to do it, but we have no examples currently at Yale.

More About Metadata

SAML Metadata can have a ton of useless information. There are four things that are actually important:

...