Front End Programming for CAS

It turns out to be simpler and more efficient to ship the request when it arrives to the node that has the ticket than to struggle to ship the ticket in advance to all the nodes that may receive the request.

The modern network Front End device can be easily programmed with a small amount of knowledge of CAS protocol to route each request to the server that can best handle it. If that is not possible with your network administrators, then the CushyFrontEndFilter does the same thing less efficiently than programming the Front End but more efficiently than running Ehcache or similar ticket replication mechanisms without the Filter.

Front End Background

Any cluster of Web Servers requires some sort of Front End device to screen and forward network traffic. Ten years ago this was a computer with some fairly simple programming. Common strategies for distributing requests:

Round Robin - A list of the servers is maintained. Each request is assigned to the next computer in the list. When you get to the end of the list, you wrap back to the beginning.
Master/Spare - All requests go to the first computer until it fails. Then requests go to the backup computer until the Master comes back up.

However, if each Web request can go to a different server, then all the data about the current session either has to be saved back to the browser (as a cookie) or it has to be saved to a database that all servers in the cluster can access. The database limited performance. Companies building network equipment realized that they would have a competitive advantage if the Front End learned a little bit more about servers and protocols to optimize request routing and make things simpler and more efficient for the applications. Then they added additional features to enhance security. Today, these device track repetitive activity from very active IP addresses to detect and suppress Denial of Service attacks. With "deep packet inspection" they can identify attempts to hack a system. They can remove HTTP Headers regarded as inappropriate, and add Headers of their own. Vendors may provide even fairly complex services for very widely used programs, and local network administrators can add their own coding in some high level language for applications that are locally important.

For example, users at Yale know that CAS is "https://secure.its.yale.edu/cas". In reality, DNS resolves secure.its.yale.edu to 130.132.35.49 and that is a Virtual IP address (a VIP) on the BIG-IP F5. The VIP requires configuration, because the F5 has to hold the SSL Certificate for "secure.its.yale.edu" and manage the SSL protocol. Yale decided to make it appear that other security applications appear to run on the secure.its.yale.edu machine, even though each application has its own pool of VMs. So the F5 has to examine the URL to determine if the server name is followed with "/cas" and therefore goes to the pool of CAS VMs, or if it contains "/idp" and therefore goes to the Shibboleth pool of VMs. Because it is the F5 that is really talking to the browser, it has to create a special Header with the browser IP address in the event that it is important to the Web server.

The amount of logic in the Front End can be substantial. Suppose CAS wants to use X.509 User Certificates installed in the browser as a authentication credential. CAS has an entire X509 support module, but that depends on the browser talking to the CAS server directly. If the Front End is terminating the SSL/TLS session itself, then it has to be configured with all the standard information needed by any Web component that handles user certificates. There has to be a special list of "Trusted" Certificate Authorities from which User Certificates will be accepted. The Front End has to tell the browser that certificates are required or optional. The signature in the submitted Certificate has to be validated against the Trusted CA list. The Certificate has to be ASN.1 decoded, and then the DN and/or one or more subjectAltNames has to be extracted, and they have to be turned into new HTTP headers that can be forwarded to the application. All this is fairly standard stuff and it is typically part of the built in code for any Front End device.

So teaching a Front End about CAS protocol is not that big a deal. Commercial sites do this all the time, but they don't run CAS. Universities probably spend less time optimizing eCommerce applications and so they may not normally think about Front End devices this way.

Routing the CAS Protocol

First, however, we need to understand the format of CAS ticketids because that is where the routing information comes from:

type - num - random - suffix

where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end is configured in the uniqueIdGenerators.xml file.

A typical XML configuration for a particular type of ticket (when you use CushyClusterConfiguration) looks like this:

<bean id="ticketGrantingTicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator">
<constructor-arg index="0" type="int" value="50" />
<constructor-arg index="1" value="#{clusterConfiguration.getTicketSuffix()}" />
</bean>

The suffix value, which is the index="1" argument to the Java object constructor, is obtained using a Spring "EL" expression to be the TicketSuffix property of the bean named clusterConfiguration. By directly feeding the output of clusterConfiguration into the input of the Ticket ID Generator, this approach makes configuration simple and ensures that all the machines come up configured properly. There is special logic in Cushy for an F5 which, for some reason, likes to identify hosts by the MD5 hash of the character representation of their decimal/dotted IPv4 address.

Every CAS request except the initial login comes with one or more tickets located in different places in the request. There is a sequence of tests and you stop at the first match:

If the Path part of the URL is a validate request (/cas/validate, /cas/serviceValidate, /cas/proxyValidate, or /cas/samlValidate) then look at the ticket= parameter in the query string part of the URL
Otherwise, if the Path part of the URL is a /cas/proxy request, then look at the pgt= parameter in the query string.
Otherwise, if the request has a CASTGC cookie, then look at the cookie value.
Otherwise, the request is probably in the middle of login so use any built in Front End support for JSESSIONID.
Otherwise, or if the node selected by 1-4 is down, choose any CAS node from the pool.

That is the code, now here is the explanation:

After receiving a Service Ticket ID from the browser, an application opens its own HTTPS session to CAS, presents the ticket id in a "validate" request. If the id is valid CAS passes back the Netid, and in certain requests can pass back additional attributes. This request is best handled by the server that issued the Service Ticket.
When a middleware server like a Portal has obtained a CAS Proxy Granting Ticket, it requests CAS to issue a Service Ticket by opening its own HTTPS connection to CAS to make a /proxy call. Since the middleware is not a browser, it does not have a Cookie to hold the PGT. So it passes that ticketid explicitly in the pgt= parameter. This request is best handled by the server that created the Proxy Granting Ticket.
After a user logs in, CAS creates a Login TGT that points to the Netid and attributes and writes the ticket id of the TGT to the browser as a Cookie. The Cookie is sent back from the browser in any request to "https://secure.its.yale.edu/cas". After initial login, all requests with cookies are requests to issue a Service Ticket for a new application using the existing CAS login. This is best handled by the server that created the TGT.
If there is no existing ticket, then the user is logging into CAS. This may be the GET that returns the login form, or the POST that submits the Userid and Password. Vanilla CAS code works only if the POST goes back to the same server than handled the GET. This is the only part of CAS that actually has an HttpSession.
Otherwise, if there is no JSESSIONID then this is the initial GET for the login form. Assign it to any server.

Only cases 1-3 actually involve special CAS protocol logic. Steps 4 and 5 are standard options that will be programmed into any Front End device already, so after test 3 you basically fall through to whatever the Front End did before you added special code.

Cases 1-3 are only meaningful for a GET request. The only CAS POST is the login form and it is case 4.

CushyFrontEndFilter

Although the best place to do Front End programming is in the Front End, the CAS administrator does not control that device. You can, however, introduce a version of the same logic with the CushyFrontEndFilter. A Servlet Filter processes Web requests before the main part of the application (that is, before Spring and CAS see it).

CushyFrontEndFilter is enabled when you add it to the list of filters in the WEB-INF/web.xml file. Set the filter mapping to "/*" so it scans all requests. It has been designed to work with Ehcache or CushyTicketRegistry. It depends on configuration automatically generated if you use CushyClusterConfiguration, although I suppose it could be manually configured. Clearly it cannot do anything unless the uniqueIdGenerator.xml file has been configured with node specific ticket suffix values.

It uses Apache HttpClient to maintain a pool of reusable SSL sessions with each of the other nodes in the cluster. When a GET request arrives and "case 1-3" analysis indicates that another node generated the ticket associated with the request, then it forwards the request to the CAS node that can best handle it, and it forwards the response from that node back to the requester. Essentially this duplicates the Front End behavior using Java programming.

It turns out that forwarding requests is more efficient than synchronously replicating tickets. To prove this, we need to walk through request processing with and without the Filter. Consider the two node configuration (because when you do the numbers, more than two nodes is even more favorable to the Filter).

Start with an Ehcache TicketRegistry and default configuration. Consider the sequent of creating and then validating a Service Ticket.

An HTTP GET arrives from the browser with a service= value in the query string. CAS issues the ST, and then the addTicket operation is intercepted by Ehcache and, because the Ehcache Service Ticket cache is configured for synchronous replication, an RMI call is sent to the other node, and the request waits until a response is received. Then the ST ID is returned to the browser. There has been one network transaction and one delay.

The application issues an HTTP GET to validate the ticket. It is processed by CAS and a deleteTicket call is made. Although there is no rush to delete a ticket, it turns out that you cannot make a cache partially synchronous. If addTicket is synchronous, then deleteTicket has to be also. So this request also has to wait until an RMI request completes to delete the ticket on the other node. Again there is one network transaction and one delay.

If you add CushyFrontEndFilter processing in front of the CAS logic, then the validate request will always be forwarded to the node that created the Service Ticket. Replication does not have to get there before the validate request, so it does not have to be done synchronously. Reconfigure the Service Ticket cache to do lazy asynchronous replication just like Ehcache processing of Login TGTs.

When a real Front End is programmed, the "case 3" situation routes requests based on the CASTGC cookie suffix to the node that generated the login ticket. In Ehcache the TGTs are replicated and shared by all the nodes. So it is not necessary with Ehcache to do CAS Cookie routing. This can be turned off in the CushyFrontEnd filter. We will consider the overhead with or without this optimization.

When the browser submits a request to generate a Service Ticket there is a 50% chance of selecting the node that generated the Login ticket. If Cookie based routing is turned on, then half the time there is a network transaction and a delay, while if it is turned off there is no transaction and delay. We save at least a half of a transaction, or all of it without Cookie routing.

When the application submits a request to validate a Service Ticket, there is a 50% chance of selecting the node that generated the ST. When that happens, there is no network transaction. When it doesn't happen, then the filter borrows an SSL socket that already exists, sends a packet of HTTP Headers to the other node, and gets back the validation response to send back to the application. This is as good or better than the RMI request that naked Ehcache sends to synchronously delete the ticket. So on this operation we are no worse and sometimes better.

To be honest, a Round Robin scheduling algorithm will tend to maximize cost. Because the validate request comes in very quickly, there is a good chance it will be the next CAS operation. Round Robin guarantees that it will be sent to a different server than the one that handled the previous request, which is the server that created the ticket. Random selection of a server would be better.

However, the big payoff occurs when you realize that in this configuration there is really no need to replicate Service Tickets at all. Each ST is created, processed, and deleted on the host that created it. You still need a place to store them in memory, so you still need the Ehcache Service Ticket cache. You just don't need to configure that cache to be replicated to other nodes.

If you use CushyTicketRegistry then you must either have real Front End programming or the Filter. If you use the Filter, you must not disable "case 3" routing based on the CASTGC Cookie. However, this analysis of cost indicates that the Filter is still better than naked synchronous Ehcache, so the requirement to use the Filter (or real Front End programming) does not provide any advantage to Ehcache over CushyTicketRegistry. Rather, you have to compare the two TicketRegistry implementations on their own and make your own choice.