Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This becomes an outline for various cluster node failure tests. Whenever one ticket points to a parent there is a model where the ticket pointed to was created on a node that failed and the new ticket has to be created on the backup server acting on behalf of that node. So you want to test the creation and validation of a Service Ticket on node B when the TGT was created on node A, or the creation of a PGT on node B when the TGT was created on node A, and so on.

Front End Programming

Users at Yale know that CAS is "https://secure.its.yale.edu/cas". In reality, DNS resolves secure.its.yale.edu to 130.132.35.49 and that is a Virtual IP address (a VIP) on the BIG-IP F5. There has to be special configuration for that particular IP address. For example, the F5 has to hold the SSL Certificate for "secure.its.yale.edu" and it has to manage the SSL protocol.

CAS is not the only application that appears to be on secure.its.yale.edu. So when a GET or POST arrives at that IP address, the F5 has to look at the URL that follows it. If the URL begins with /cas then it will be handled by one of the configured IP addresses for one of the VMs that run the CAS code. So the F5 is already making decisions based on part of the URL.

Sometimes a new user should be randomly assigned to any available server, but then subsequent requests should go to the same server. This can be done by JSESSIONID if it is a J2EE server. Making decisions based on this is more complicated because the F5 has to maintain tables. It is so common, however, that this is built in logic in the F5 that you can turn on by clicking a checkbox.

The Cushy requirements are simpler and cleaner. Routing is done based on data in the request. The data is either in the URL or in an HTTP Header. The data explicitly identifies the server to which the request should be routed. There must be a bit of special programming because locating the routing information depends on knowing the CAS protocolAny cluster of Web Servers requires some sort of Front End device to screen and forward network traffic. Ten years ago this was a simple computer that normally assigned traffic to servers on a round robin basis. Today the primary function of many Front Ends is to protect the servers from Denial of Service attacks, attempts to brute force passwords, and other security problems. To do this, the device understands many common network protocols so it can do "deep packet" inspection. HTTP is probably the simplest of the protocols. A Front End will examine the URL, remove certain headers regarded as dangerous, and add headers of its own. It can select a specific server from the pool based on data in the request, although this is most commonly used to maintain "sessions" between a particular client and server.

Users at Yale know that CAS is "https://secure.its.yale.edu/cas". In reality, DNS resolves secure.its.yale.edu to 130.132.35.49 and that is a Virtual IP address (a VIP) on the BIG-IP F5. The VIP requires configuration, because the F5 has to hold the SSL Certificate for "secure.its.yale.edu" and manage the SSL protocol.

Yale decided to make it appear that other security applications appear to run on the secure.its.yale.edu machine, even though each application has its own pool of VMs. So the F5 has to examine the URL to determine if it begins with "/cas" and therefore goes to the pool of CAS VMs, of if it references a different application and pool. The F5 has to inspect and generate HTTP Headers if the real client IP address is passed on to a Web Server for processing.

This means that if CAS is going to use X.509 User Certificates as a non-interactive form of authentication, then all the configuration that would in a standalone server be managed by the X509 optional component of CAS has to be configured in the F5. This is required by SSL protocol, it is not CAS specific. There has to be a special list of "Trusted" Certificate Authorities from which User Certificates will be accepted. The browser has to be told that certificates are required, permitted, or not allowed. The signature in the submitted Certificate has to be validated against the Trusted CA list. The Certificate has to be ASN.1 decoded, and then the DN and/or one or more subjectAltNames has to be extracted, and they have to be turned into HTTP headers that can be forwarded to the application. The F5 has most of this programming built in, although the last step of creating headers has to be manually coded. By comparison, routing requests based on CAS ticketids is simple.

Routing requests to particular servers based on the content of request line and the headers is part of what generic Front End devices (not just the F5) call "Layer 5-7 routing". The internet routes messages between computers using Layer 4 routing (IP) but Front End devices select the last hop to the specific VM based on data and and understanding of the higher level protocols. For example, if a large university divided its CAS servers up by physically separated campuses, then people who normally go to one campus could be given an OU= in the DN of their X.509 User Certificate that would preferentially route CAS requests to the server or pool of servers for the home campus. Servers at other campus locations then provide offsite backup.

After the first request is randomly assigned to a Java J2EE server, subsequent requests can be sent back to the same server if the Front End understands JSESSIONID protocol. The Java server places a parameter called JSESSIONID in the first response to the browser, and the browser sends it back as a Cookie or as part of the URL. The F5 has built in programming to handle JSESSIONID, but that requires tables and is a lot more complex than CAS.

First, however, we need to understand the format of CAS ticketids because that is where the routing information comes from.CAS Ticket IDs have four sections:

type - num - random - suffix

where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end is configured in the uniqueIdGenerators.xml file.There are separate XML configurations for different types of tickets, but they all look alike and they all occur in the uniqueIdGenerators.xml file. With cushy the suffix is tied to the TicketSuffix property generated by the CushyClusterConfiguration

A typical XML configuration for a particular type of ticket (when you use Cushy) looks like this:

<bean id="ticketGrantingTicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator">
<constructor-arg index="0" type="int" value="50" />
<constructor-arg  index="1"  value="#{clusterConfiguration.getTicketSuffix()}" />
</bean>

So when Cushy figures out what cluster this computer is in and assigns each node a name, it generates the TicketSuffix value and feeds it to the ticket ID generation logic on each node. In the simplest case, the suffix is just the node name. The F5, howeverThe suffix value, which is the index="1" argument to the Java object constructor, is obtained using a Spring "EL" expression to be the TicketSuffix property of the bean named clusterConfiguration. This is the CushyClusterConfiguration object that scans the configured cluster definitions to determine which cluster the server is running in and what name and IP address it uses.  By directly feeding the output of clusterConfiguration into the input of the Ticket ID Generator, this approach makes configuration simple and ensures that all the machines come up configured properly. There is special logic in Cushy for an F5 which, for some reason, likes to identify hosts by the MD5 hash of the character representation of their IP address.

Every CAS request except the initial login comes with one or more tickets located in different places in the request. There is a sequence of tests and you stop at the first match:

  1. If the Path part of the URL is a validate request (/cas/validate, /cas/serviceValidate, /cas/proxyValidate, or /cas/samlValidate) then look at the ticket= parameter in the query string part of the URL
  2. If Otherwise, if the Path part of the URL is a /cas/proxy request, then look at the pgt= parameter in the query string.
  3. If Otherwise, if the request has a CASTGC cookie, then look at the cookie value.If a
  4. Otherwise, use the built in support if the request has a JSESSIONID, then the user is in the middle of login and send it back to the same node.
  5. Otherwise, or if the node selected by 1-4 is down, choose any CAS node from the pool.

That is the code, now here is the explanation:

  1. After receiving a Service Ticket ID from the browser, an application opens its own HTTPS session to CAS, presents the ticket id in a "validate" request. If the id is valid CAS passes back the Netid, and in certain requests can pass back additional attributes. The suffix on the ticket= parameter identifies the CAS server that created the ticket and has it in memory without requiring any high speed replicationThis request is best handled by the server that issued the Service Ticket.
  2. When a middleware server like a Portal has obtained a CAS Proxy Granting Ticket, it requests CAS to issue a Service Ticket by making a /proxy call. Since the middleware is not a browser, it does not have a Cookie to hold the PGT. So it passes it explicitly in the pgt= parameter.
  3. After a user logs in, CAS creates a Login TGT that points to the Netid and attributes and writes the ticket id of the TGT to the browser as a Cookie. The Cookie is scoped to the URL of the CAS application as seen from the browser point of view. At Yale this is "https://secure.its.yale.edu/cas" and so whenever the browser sees a subsequent URL that begins with this string, it appends the CASTGC Cookie with the TGT ID. CAS uses this to find the TGT object and knows that the user has already logged in. This rule sends a browser back to the CAS node the user is logged into.
  4. If the first three tests fail, this request is not associated with an existing logged in user. CAS has a bug/feature that it depends on Spring Web Flow and stores data during login in Web Flow storage which in turn depends on the HTTPSession object maintained by the Web Server (Tomcat, JBoss, ...). You can cluster JBoss or Tomcat servers to share HTTPSession objects over the network, but it is simpler if you program the Front End so that if the user responds in a reasonable amount of time, the login form with the userid and password is send back to the Web Server that wrote the form it to the browser in response to the browser's original HTTP GET. This is called a "sticky session" and the F5 does it automatically if you just check a box. You don't need to write code.
  5. Otherwise, if this is a brand new request to login to CAS or if the CAS Server selected by one of the previous steps has failed and is not responding to the Front End, then send the request to any available CAS server.

Lets be clear about what is happening here.

Neither CAS nor the F5 is maintaining a "session" as that term is commonly used. Requests are being routed based on data in the HTTP headers. When an application validates Service Tickets that it receives, every Service Ticket validation request can be routed to a different specific CAS VM based on the suffix of different ticket= parameters. In any normal meaning of "session" requests from the same client would go to the same server. However, the purpose of this logic is to do exactly what sessions normally do, select a specific VM from the pool, it is just that this selection is based on data in the request unrelated to either the identity of the client or any previous traffic from that client.

If you look at the rather silly example from the F5 manual, it is clear that the device is designed to be able to do this sort of thing:

set uri [HTTP::uri]
if { $uri ends_with ".gif" }
   { pool my_pool }
elseif { $uri ends_with ".jpg" }
   { pool your_pool }

  1. by opening its own HTTPS connection to CAS to make a /proxy call. Since the middleware is not a browser, it does not have a Cookie to hold the PGT. So it passes that ticketid explicitly in the pgt= parameter. This request is best handled by the server that created the Proxy Granting Ticket.
  2. After a user logs in, CAS creates a Login TGT that points to the Netid and attributes and writes the ticket id of the TGT to the browser as a Cookie. The Cookie is sent back from the browser in any request to "https://secure.its.yale.edu/cas". After initial login, all requests with cookies are requests to issue a Service Ticket for a new application using the existing CAS login. This is best handled by the server that created the TGT.
  3. If there is no existing ticket, then the user is logging into CAS. This may be the GET that returns the login form, or the POST that submits the Userid and Password. Vanilla CAS code works only if the POST goes back to the same server than handled the GET. This is the only part of CAS that actually has an HttpSession.
  4. Otherwise, if there is no JSESSIONID then this is the initial GET for the login form. Assign it to any server.

Except for Case 4 during login, neither the browser, JBoss, CAS, or the F5 is maintaining a "session" as that term is commonly used, where requests from the same client always go to the same server and the server maintains an HttpSession object. Since the entire CAS function is based on creating and updating Ticket objects, each CAS request except the initial browser logon references a specific Ticket ID. By storing in the Ticket ID a field that easily identifies to the F5 the CAS server that created and owns the Ticket, CAS protocol now provides a relatively simple algorithm for routing requests to the best server. It is vastly simpler than other protocols that the F5 has built in because of their wide use.

The F5 understands HTTP requests and already has both expressions and logic to locate "the Path part of the URL", "the ticket= parameter in the query string", and "the CASTGC Cookie value". All that has to be coded is the comparison of these predefined items to test values, and an expression to extract the string that follows the third "-" character in a given ticket value.

CAS does not require the F5 to create any new table. The pool of servers associated with /cas is already part of the F5 configuration. The coding is just to select a specific server in the pool based on data in a field of each request. The logic depends on the CAS protocol, which is fairly stable and does not change between minor releaseshas been updated only three times since CAS was created, rather than the characteristics of any particular CAS release.

What Cushy Does at Failure

...