It turns out to be simpler and more efficient to ship the request when it arrives to the node that has the ticket than to struggle to ship the ticket in advance to all the nodes that may receive the request.
The modern network Front End device can be easily programmed with a small amount of knowledge of CAS protocol to route each request to the server that can best handle it. If that is not possible with your network administrators, then the CushyFrontEndFilter does the same thing less efficiently than programming the Front End but more efficiently than running Ehcache or similar ticket replication mechanisms without the Filter.
Front End Background
Any cluster of Web Servers requires some sort of Front End device to screen and forward network traffic. Ten years ago this was a computer with some fairly simple programming. Common strategies for distributing requests:
- Round Robin - A list of the servers is maintained. Each request is assigned to the next computer in the list. When you get to the end of the list, you wrap back to the beginning.
- Master/Spare - All requests go to the first computer until it fails. Then requests go to the backup computer until the Master comes back up.
There is a problem when different requests from the same client are randomly distributed to different servers. The servers may be retaining some information about the client (like a "shopping basket" in eCommerce). This information is tied to a Session object, and so some clustering technology is designed to replicate the Session object and all its data to all the other servers that run the same application. JBoss servers do this when you turn on their clustering feature.
However, designers of Front End devices quickly learned However, if each Web request can go to a different server, then all the data about the current session either has to be saved back to the browser (as a cookie) or it has to be saved to a database that all servers in the cluster can access. The database limited performance. Companies building network equipment realized that they would have a competitive advantage if they the Front End learned a little bit about the common application and Web servers and made more intelligent decisions that optimized the performance of the servers and the customer applications they support. For example, if the Front End knows about the JSESSIONID parameter used by Java Web servers to identify Session data, then the Front End can send subsequent requests back to the same server that handled previous requests for the same transaction. Now the Web server doesn't have to replicate Session information frantically in order to handle the next Web page request, although it may back up data to recover from a server crash.As chips and memory became less expensive, Front End devices became more powerful and added even more features to protect and optimize applicationsmore about servers and protocols to optimize request routing and make things simpler and more efficient for the applications. Then they added additional features to enhance security. Today, these device track repetitive activity from very active IP addresses to detect and suppress Denial of Service attacks. With "deep packet inspection" they can identify attempts to hack a system. They can remove HTTP Headers regarded as inappropriate, and add Headers of their own. Vendors may provide even fairly complex services for very widely used programs, and local network administrators can add their own coding in some high level language for applications that are locally important.
For example, users at Yale know that CAS is "https://secure.its.yale.edu/cas". In reality, DNS resolves secure.its.yale.edu to 130.132.35.49 and that is a Virtual IP address (a VIP) on the BIG-IP F5. The VIP requires configuration, because the F5 has to hold the SSL Certificate for "secure.its.yale.edu" and manage the SSL protocol. Yale decided to make it appear that other security applications appear to run on the secure.its.yale.edu machine, even though each application has its own pool of VMs. So the F5 has to examine the URL to determine if the server name is followed with "/cas" and therefore goes to the pool of CAS VMs, or if it contains "/idp" and therefore goes to the Shibboleth pool of VMs. Because it is the F5 that is really talking to the browser, it has to create a special Header with the browser IP address in the event that it is important to the Web server.
The amount of logic in the Front End can be substantial. Suppose CAS wants to use X.509 User Certificates installed in the browser as a authetication authentication credential. CAS has an entire X509 support module, but that depends on the browser talking to the CAS server directly. If the Front End is terminating the SSL/TLS session itself, then it has to be configured with all the standard information needed by any Web component that handles user certificates. There has to be a special list of "Trusted" Certificate Authorities from which User Certificates will be accepted. The Front End has to tell the browser that certificates are required or optional. The signature in the submitted Certificate has to be validated against the Trusted CA list. The Certificate has to be ASN.1 decoded, and then the DN and/or one or more subjectAltNames has to be extracted, and they have to be turned into new HTTP headers that can be forwarded to the application. All this is fairly standard stuff and it is typically part of the built in code for any Front End device.
Modern Front End systems can select specific servers from the pool based on data in the URL or Headers or based on the recent history of requests from that client device. Requests from phones could go to a different pool of servers than requests from PCs. If the Front End is going to all the trouble of decoding the X.509 User Certificate, then it could select servers based on organizational unit or geographic information that it contains. The application can write out a Cookie that the Front End can subsequently use to select specific servers for the next request.
So teaching a Front End about CAS protocol is not that big a deal. Commercial sites do this all the time, but they don't run CAS. Universities probably spend less time optimizing eCommerce applications and so they may not normally think about Front End devices this way.
Routing the CAS Protocol
First, however, we need to understand the format of CAS ticketids because that is where the routing information comes from:
...
A typical XML configuration for a particular type of ticket (when you use CushyCushyClusterConfiguration) looks like this:
<bean id="ticketGrantingTicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator">
<constructor-arg index="0" type="int" value="50" />
<constructor-arg index="1" value="#{clusterConfiguration.getTicketSuffix()}" />
</bean>
The suffix value, which is the index="1" argument to the Java object constructor, is obtained using a Spring "EL" expression to be the TicketSuffix property of the bean named clusterConfiguration. This is the CushyClusterConfiguration object that scans the configured cluster definitions to determine which cluster the server is running in and what name and IP address it uses. By directly feeding the output of clusterConfiguration into the input of the Ticket ID Generator, this approach makes configuration simple and ensures that all the machines come up configured properly. There is special logic in Cushy for an F5 which, for some reason, likes to identify hosts by the MD5 hash of the character representation of their decimal/dotted IPv4 address.
...
Cases 1-3 are only meaningful for a GET request. The only CAS POST is the login form and it is case 4.
CushyFrontEndFilter
CAS uses SSL/TLS sessions and they are rather expensive to set up. So after going to all the trouble to create one of these sessions, the client and the server reuse it over and over again. If the secure session was set up with client identification information, like an X.509 Certficate or a SPNEGO (Windows integrated AD login), then the SSL session belongs to just one user. Otherwise the session is anonymous and can carry many unrelated requests from many users.
The Front End will maintain a pool of already established SSL connections to the CAS servers. If it routes a request using the logic of cases 1-3 to a specific server, this has nothing to do with what Web servers normally describe as a "session". Consecutive ST validation requests from the same application will go to different CAS servers based on different suffix values in the different ST ids, and they will reuse entirely different idle SSL sessions from the pool.
If you would like to see how this sort of programming is done, but you don't want to learn about real Front End devices, the same program logic is defined in the CushyFrontEndFilter.
In Java, an application can specify one or more Filters. A Filter is a Java class that can scan the Request as it is coming in to the application (CAS in this case) and it can scan the response generated by the application. The CushyFrontEndFilter scans incoming Requests an applies the logic for cases 1-3 to forward requests best handled by another node in the cluster.
If the programming is added to the Front End device, then it forwards request to the CAS server that issued the ticket and holds it in memory. However, without intelligent Front End programming the request arrives at a randomly chosen CAS node. At this point we have only two possible strategies:
- The traditional idea is to configure the TicketRegistry to replicate ticket information to all the nodes or to the database server before returning the ST back to the browser. So 100% of the time you have a network transaction from one server to all the other servers and there is a delay for the network operation to complete added to the end of the Service Ticket generation. However, you should not forget that when the ticket gets validated the ST is deleted, and since you cannot configure Ehcache or any of the other technologies to do addTicket synchronously and deleteTicket asynchronously, then 100% of the time you get a second operation at the end of ST validation.
- With CushyFrontEndFilter the Service Ticket generation returns immediately. When the ST validation request comes in then there is a 1/n chance it will be routed to the right server by chance (that is, in a two node cluster it goes the the right place 50% of the time). Only if it goes to the wrong server is there a network operation, and then it will be from the server that received the request to the server that owns the ticket. There will only be the one exchange instead of two operations and two delays.
So with Ehcache or any other synchronous replication strategy there is a 100% probability of two network transactions and delays, while with the Filter there is a 50% chance of no operation or delay, and a 50% chance of one operation and delay. Roughly speaking, the Filter has 25% of the overhead of the traditional mechanism.
But wait, you say, tickets still have to be replicated to other nodes for recovery if a node crashes. Yes, that is right, at least for Login TGTs and PGTs. However, Service Tickets are created, validated, and deleted so fast that if you are replicating ticket asynchronously, whether you use CushyTicketRegistry or Ehcache default replication every 10 seconds, that most of the time the ST doesn't hang around long enough to get replicated.
So while programming the Front End is the best solution, using CushyFrontEndFilter should clearly be more efficient than the current system of synchronous ST ticket replicationAlthough the best place to do Front End programming is in the Front End, the CAS administrator does not control that device. You can, however, introduce a version of the same logic with the CushyFrontEndFilter. A Servlet Filter processes Web requests before the main part of the application (that is, before Spring and CAS see it).
CushyFrontEndFilter is enabled when you add it to the list of filters in the WEB-INF/web.xml file. Set the filter mapping to "/*" so it scans all requests. It has been designed to work with Ehcache or CushyTicketRegistry. It depends on configuration automatically generated if you use CushyClusterConfiguration, although I suppose it could be manually configured. Clearly it cannot do anything unless the uniqueIdGenerator.xml file has been configured with node specific ticket suffix values.
It uses Apache HttpClient to maintain a pool of reusable SSL sessions with each of the other nodes in the cluster. When a GET request arrives and "case 1-3" analysis indicates that another node generated the ticket associated with the request, then it forwards the request to the CAS node that can best handle it, and it forwards the response from that node back to the requester. Essentially this duplicates the Front End behavior using Java programming.
It turns out that forwarding requests is more efficient than synchronously replicating tickets. To prove this, we need to walk through request processing with and without the Filter. Consider the two node configuration (because when you do the numbers, more than two nodes is even more favorable to the Filter).
Start with an Ehcache TicketRegistry and default configuration. Consider the sequent of creating and then validating a Service Ticket.
An HTTP GET arrives from the browser with a service= value in the query string. CAS issues the ST, and then the addTicket operation is intercepted by Ehcache and, because the Ehcache Service Ticket cache is configured for synchronous replication, an RMI call is sent to the other node, and the request waits until a response is received. Then the ST ID is returned to the browser. There has been one network transaction and one delay.
The application issues an HTTP GET to validate the ticket. It is processed by CAS and a deleteTicket call is made. Although there is no rush to delete a ticket, it turns out that you cannot make a cache partially synchronous. If addTicket is synchronous, then deleteTicket has to be also. So this request also has to wait until an RMI request completes to delete the ticket on the other node. Again there is one network transaction and one delay.
If you add CushyFrontEndFilter processing in front of the CAS logic, then the validate request will always be forwarded to the node that created the Service Ticket. Replication does not have to get there before the validate request, so it does not have to be done synchronously. Reconfigure the Service Ticket cache to do lazy asynchronous replication just like Ehcache processing of Login TGTs.
When a real Front End is programmed, the "case 3" situation routes requests based on the CASTGC cookie suffix to the node that generated the login ticket. In Ehcache the TGTs are replicated and shared by all the nodes. So it is not necessary with Ehcache to do CAS Cookie routing. This can be turned off in the CushyFrontEnd filter. We will consider the overhead with or without this optimization.
When the browser submits a request to generate a Service Ticket there is a 50% chance of selecting the node that generated the Login ticket. If Cookie based routing is turned on, then half the time there is a network transaction and a delay, while if it is turned off there is no transaction and delay. We save at least a half of a transaction, or all of it without Cookie routing.
When the application submits a request to validate a Service Ticket, there is a 50% chance of selecting the node that generated the ST. When that happens, there is no network transaction. When it doesn't happen, then the filter borrows an SSL socket that already exists, sends a packet of HTTP Headers to the other node, and gets back the validation response to send back to the application. This is as good or better than the RMI request that naked Ehcache sends to synchronously delete the ticket. So on this operation we are no worse and sometimes better.
To be honest, a Round Robin scheduling algorithm will tend to maximize cost. Because the validate request comes in very quickly, there is a good chance it will be the next CAS operation. Round Robin guarantees that it will be sent to a different server than the one that handled the previous request, which is the server that created the ticket. Random selection of a server would be better.
However, the big payoff occurs when you realize that in this configuration there is really no need to replicate Service Tickets at all. Each ST is created, processed, and deleted on the host that created it. You still need a place to store them in memory, so you still need the Ehcache Service Ticket cache. You just don't need to configure that cache to be replicated to other nodes.
If you use CushyTicketRegistry then you must either have real Front End programming or the Filter. If you use the Filter, you must not disable "case 3" routing based on the CASTGC Cookie. However, this analysis of cost indicates that the Filter is still better than naked synchronous Ehcache, so the requirement to use the Filter (or real Front End programming) does not provide any advantage to Ehcache over CushyTicketRegistry. Rather, you have to compare the two TicketRegistry implementations on their own and make your own choice.