Front End Programming for CAS

Any cluster of Web Servers requires some sort of Front End device to screen and forward network traffic. Ten years ago this was a computer with some fairly simple programming. Common strategies for distributing requests:

Round Robin - A list of the servers is maintained. Each request is assigned to the next computer in the list. When you get to the end of the list, you wrap back to the beginning.
Master/Spare - All requests go to the first computer until it fails. Then requests go to the backup computer until the Master comes back up.

However, if each Web request can go to a different server, then all the data about the current session either has to be saved back to the browser (as a cookie) or it has to be saved to a database that all servers in the cluster can access. The database limited performance. Companies building network equipment realized that they would have a competitive advantage if the Front End learned a little bit more about servers and protocols to optimize request routing and make things simpler and more efficient for the applications. Then they added additional features to enhance security. Today, these device track repetitive activity from very active IP addresses to detect and suppress Denial of Service attacks. With "deep packet inspection" they can identify attempts to hack a system. They can remove HTTP Headers regarded as inappropriate, and add Headers of their own. Vendors may provide even fairly complex services for very widely used programs, and local network administrators can add their own coding in some high level language for applications that are locally important.

For example, users at Yale know that CAS is "https://secure.its.yale.edu/cas". In reality, DNS resolves secure.its.yale.edu to 130.132.35.49 and that is a Virtual IP address (a VIP) on the BIG-IP F5. The VIP requires configuration, because the F5 has to hold the SSL Certificate for "secure.its.yale.edu" and manage the SSL protocol. Yale decided to make it appear that other security applications appear to run on the secure.its.yale.edu machine, even though each application has its own pool of VMs. So the F5 has to examine the URL to determine if the server name is followed with "/cas" and therefore goes to the pool of CAS VMs, or if it contains "/idp" and therefore goes to the Shibboleth pool of VMs. Because it is the F5 that is really talking to the browser, it has to create a special Header with the browser IP address in the event that it is important to the Web server.

The amount of logic in the Front End can be substantial. Suppose CAS wants to use X.509 User Certificates installed in the browser as a authentication credential. CAS has an entire X509 support module, but that depends on the browser talking to the CAS server directly. If the Front End is terminating the SSL/TLS session itself, then it has to be configured with all the standard information needed by any Web component that handles user certificates. There has to be a special list of "Trusted" Certificate Authorities from which User Certificates will be accepted. The Front End has to tell the browser that certificates are required or optional. The signature in the submitted Certificate has to be validated against the Trusted CA list. The Certificate has to be ASN.1 decoded, and then the DN and/or one or more subjectAltNames has to be extracted, and they have to be turned into new HTTP headers that can be forwarded to the application. All this is fairly standard stuff and it is typically part of the built in code for any Front End device.

So teaching a Front End about CAS protocol is not that big a deal. Commercial sites do this all the time, but they don't run CAS. Universities probably spend less time optimizing eCommerce applications and so they may not normally think about Front End devices this way.

First, however, we need to understand the format of CAS ticketids because that is where the routing information comes from:

type - num - random - suffix

where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end is configured in the uniqueIdGenerators.xml file.

A typical XML configuration for a particular type of ticket (when you use CushyClusterConfiguration) looks like this:

<bean id="ticketGrantingTicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator">
<constructor-arg index="0" type="int" value="50" />
<constructor-arg index="1" value="#{clusterConfiguration.getTicketSuffix()}" />
</bean>

The suffix value, which is the index="1" argument to the Java object constructor, is obtained using a Spring "EL" expression to be the TicketSuffix property of the bean named clusterConfiguration. By directly feeding the output of clusterConfiguration into the input of the Ticket ID Generator, this approach makes configuration simple and ensures that all the machines come up configured properly. There is special logic in Cushy for an F5 which, for some reason, likes to identify hosts by the MD5 hash of the character representation of their decimal/dotted IPv4 address.

Every CAS request except the initial login comes with one or more tickets located in different places in the request. There is a sequence of tests and you stop at the first match:

If the Path part of the URL is a validate request (/cas/validate, /cas/serviceValidate, /cas/proxyValidate, or /cas/samlValidate) then look at the ticket= parameter in the query string part of the URL
Otherwise, if the Path part of the URL is a /cas/proxy request, then look at the pgt= parameter in the query string.
Otherwise, if the request has a CASTGC cookie, then look at the cookie value.
Otherwise, the request is probably in the middle of login so use any built in Front End support for JSESSIONID.
Otherwise, or if the node selected by 1-4 is down, choose any CAS node from the pool.

That is the code, now here is the explanation:

After receiving a Service Ticket ID from the browser, an application opens its own HTTPS session to CAS, presents the ticket id in a "validate" request. If the id is valid CAS passes back the Netid, and in certain requests can pass back additional attributes. This request is best handled by the server that issued the Service Ticket.
When a middleware server like a Portal has obtained a CAS Proxy Granting Ticket, it requests CAS to issue a Service Ticket by opening its own HTTPS connection to CAS to make a /proxy call. Since the middleware is not a browser, it does not have a Cookie to hold the PGT. So it passes that ticketid explicitly in the pgt= parameter. This request is best handled by the server that created the Proxy Granting Ticket.
After a user logs in, CAS creates a Login TGT that points to the Netid and attributes and writes the ticket id of the TGT to the browser as a Cookie. The Cookie is sent back from the browser in any request to "https://secure.its.yale.edu/cas". After initial login, all requests with cookies are requests to issue a Service Ticket for a new application using the existing CAS login. This is best handled by the server that created the TGT.
If there is no existing ticket, then the user is logging into CAS. This may be the GET that returns the login form, or the POST that submits the Userid and Password. Vanilla CAS code works only if the POST goes back to the same server than handled the GET. This is the only part of CAS that actually has an HttpSession.
Otherwise, if there is no JSESSIONID then this is the initial GET for the login form. Assign it to any server.

Only cases 1-3 actually involve special CAS protocol logic. Steps 4 and 5 are standard options that will be programmed into any Front End device already, so after test 3 you basically fall through to whatever the Front End did before you added special code.

Cases 1-3 are only meaningful for a GET request. The only CAS POST is the login form and it is case 4.

CushyFrontEndFilter

It is not always possible to convince the network administrators who own the Front End device to add programming for CAS protocol. It turns out, however, that this sort of programming is a good idea even if you have to wait until the request hits the application servers.

CushyFrontEndFilter is a Java Servlet Filter that you can insert using WEB-INF/web.xml in the CAS WAR file. It is typically configured by CushyClusterConfiguration to know the ticket suffix and URL of the other nodes in a CAS cluster. It builds a small pool of reusable SSL sessions with each member of the cluster and uses them to forward GET requests that the case 1-3 analysis indicates would better be handled by another node that generated the ticket. It sends back to the browser whatever reply the other node generated. In short, it does what we would prefer the Front End do, but it waits until the request hits the processing stack of one of the CAS servers.

This adds some overhead and some network delay due to the request forwarding. However, it turns out that you can save more overhead and reduce more delay than the Filter consumes.

Start with an existing configuration that uses Ehcache for the TicketRegistry. The standard configuration creates a Service Ticket cache that replicates ticket operations "synchronously" (the node that creates the ticket has to wait until the RMI call to all the other nodes finishes copying the ticket to the other caches before it can return the ticketid to the browser). If you have a two node cluster, then every Service Ticket that is created has to wait for one RMI call to the other node, and then because all operations are synchronous, every ST validation that deletes the ticket also has to wait for that RMI call to get to the other node. So the overhead is two network operations and each is synchronous, delaying the return of the response to the client.

Now consider the same configuration with CushyFrontEndFilter inserted in front of the stack, and because of that you reconfigure the ticketRegistry.xml file so that Ehcache uses delayed replication for Service Tickets with the same parameters it uses for Login Tickets.

The Front End randomly chooses a CAS server, which creates the Service Ticket and sends it back to the

If you would like to see how this sort of programming is done, but you don't want to learn about real Front End devices, the same program logic is defined in the CushyFrontEndFilter.

In Java, an application can specify one or more Filters. A Filter is a Java class that can scan the Request as it is coming in to the application (CAS in this case) and it can scan the response generated by the application. The CushyFrontEndFilter scans incoming Requests an applies the logic for cases 1-3 to forward requests best handled by another node in the cluster.

If the programming is added to the Front End device, then it forwards request to the CAS server that issued the ticket and holds it in memory. However, without intelligent Front End programming the request arrives at a randomly chosen CAS node. At this point we have only two possible strategies:

The traditional idea is to configure the TicketRegistry to replicate ticket information to all the nodes or to the database server before returning the ST back to the browser. So 100% of the time you have a network transaction from one server to all the other servers and there is a delay for the network operation to complete added to the end of the Service Ticket generation. However, you should not forget that when the ticket gets validated the ST is deleted, and since you cannot configure Ehcache or any of the other technologies to do addTicket synchronously and deleteTicket asynchronously, then 100% of the time you get a second operation at the end of ST validation.
With CushyFrontEndFilter the Service Ticket generation returns immediately. When the ST validation request comes in then there is a 1/n chance it will be routed to the right server by chance (that is, in a two node cluster it goes the the right place 50% of the time). Only if it goes to the wrong server is there a network operation, and then it will be from the server that received the request to the server that owns the ticket. There will only be the one exchange instead of two operations and two delays.

So with Ehcache or any other synchronous replication strategy there is a 100% probability of two network transactions and delays, while with the Filter there is a 50% chance of no operation or delay, and a 50% chance of one operation and delay. Roughly speaking, the Filter has 25% of the overhead of the traditional mechanism.

But wait, you say, tickets still have to be replicated to other nodes for recovery if a node crashes. Yes, that is right, at least for Login TGTs and PGTs. However, Service Tickets are created, validated, and deleted so fast that if you are replicating ticket asynchronously, whether you use CushyTicketRegistry or Ehcache default replication every 10 seconds, that most of the time the ST doesn't hang around long enough to get replicated.

So while programming the Front End is the best solution, using CushyFrontEndFilter should clearly be more efficient than the current system of synchronous ST ticket replication.