Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 106 Next »

  • Recover tickets after reboot without JPA, or a separate server, or a cluster (works on a standalone server)
  • Recover tickets after a crash, except for the last few seconds of activity that did not get to disk.
  • No dependency on any external library. Pure Java using only the standard Java SE runtime.
  • All source in one class. A Java programmer can read it and understand it.
  • Can also be used to cluster CAS servers
  • Cannot crash CAS ever, no matter what is wrong with the network or other servers.
  • A completely different and simpler approach to the TicketRegistry. Easier to work with and extend.
  • Probably less efficient than the other TicketRegistry solutions, but it has a constant predictable overhead you can measure and "price out".

CAS is a Single SignOn solution. Internally the function of CAS is to create, update, and delete a set of objects it calls "Tickets" (a word borrowed from Kerberos). A Logon Ticket (TGT) object is created to hold the Netid when a user logs on to CAS. A partially random string is generated to be the login ticket-id and is sent back to the browser as a Cookie and is also used as a "key" to locate the logon ticket object in a table. Similarly, CAS creates Service Tickets (ST) to identity a user to an application that uses CAS authentication.

CAS stores its tickets in a plug-in selectable component called a TicketRegistry. CAS provides one implementation of the TicketRegistry for single-server configurations, and at least four alternatives that can be used to share tickets among several CAS servers operating in a network cluster. This document describes a new implementation called CushyTicketRegistry that is simple, provides added function for the standalone server, and yet also operates in clustered configurations.

Four years ago Yale implemented a "High Availability" CAS cluster using JBoss Cache to replicate tickets. After that, the only CAS crashes were caused by failures of JBoss Cache. Red Hat failed to diagnose or fix the problem. We considered replacing JBoss Cache with Ehcache, but there is a more fundamental problem here. It should not be possible for any failure of the data replication mechanism to crash all of the CAS servers at once. Another choice of cache might be more reliable, but it would suffer from the same fundamental structural problem.

All of the previous CAS cluster solutions create a common pool of tickets shared by all of the cluster members. They are designed and configured so that the Front End can distribute requests in a round-robin approach and any server can handle any request. However, once the Service Ticket is returned by one server, the request to validate the ST comes back in milliseconds. So JPA must write the ST to the database, and Ehcache must synchronously replicate the ST to all the other servers, before the ST ID is passed back to the browser. Synchronous replication was the option that exposed CAS to crashing if the replication system had problems, and it imposed a sever performance constraint that requires all the CAS servers to be connected by very high speed networking.

Disaster recovery and very high availability suggests that at least one CAS server should be kept at a distance independent of the machine room, its power supply and support systems. So there is tension between performance considerations to keep servers close and recovery considerations to keep things distant.

Ten years ago, when CAS was being designed, the Front End that distributed requests to members of the cluster was typically an ordinary computer running simple software. Today networks have become vastly more sophisticated, and Front End devices are specialized machines with powerful software. They are designed to detect and fend off Denial of Service Attacks. They improve application performance by offloading SSL/TLS processing. They can do "deep packet inspection" to understand the traffic passing through and route requests to the most appropriate server (called "Layer 5-7 Routing" because requests are routed based on higher level protocols rather than just IP address or TCP session). Although this new hardware is widely deployed, CAS clustering has not changed and has no explicit option to take advantage of it.

Front End devices know many protocols and a few common server conventions. For everything else they expose a simple programming language. While CAS performs a Single Sign On function, the logic is actually designed to create, read, update, and delete tickets. The ticketid is the center of each CAS operation. In different requests there are only three places to find the ticketid that defines this operation:

  1. In the ticket= parameter at the end of the URL for validation requests.
  2. In the pgt= parameter for a proxy request.
  3. In the CASTGC Cookie for browser requests.

Programming the Front End to know that "/validate", "/serviceValidate", and two other strings in the URL path means that this is case 1, and "/proxy" means it is case 2, and everything else is case 3 is pretty simple.

Of course, finding the ticket is not helpful unless you use a feature that has always been part of CAS configuration but previously was previouslty not particularly useful. Each server can put a specific identifier on the end of every ticketid it creates. This is the "suffix" of the ticket in the configuration parameters, but typically it has been left as the default string "-CAS". If the suffix is configured meaningfully, and if it is set to a value the Front End can use to identify the node, then combined with the previous three steps the Front End can be configured to route ticket requests preferentially to the node that created the ticket and therefore holds it in memory without depending first on cluster replication.

Of course, tickets still have to be replicated for recovery purposes, but that means that tickets can be replicated in seconds instead of milliseconds, and they can be queued and replicated periodically instead of synchronously (while the request waits). This makes the clustering mechanism much easier and more reliable.

Of course, the Front End is owned by the Networking staff, and they are not always responsive to the needs of the CAS administrator. Although it is obviously more efficient to program the Front End, the CushyFrontEndFilter can be added to the Servlet configuration of the CAS server to do in Java the same thing the Front End should be doing, at least until your network administrators adopt a more enlightened point of view.

"Cushy" stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale". This summarizes what it is and how it works.

For objects to be replicated from one node to another, programs use the Java writeObject statement to "Serialize" the object to a stream of bytes that can be transmitted over the network and then restored in the receiving JVM. Ehcache and the other ticket replication systems operate on individual tickets. However, writeObject can operate just as well on the entire contents of the TicketRegistry. This is very simple to code, it is guaranteed to work, but it might not be efficient enough to use. Still, once you have the idea the code starts to write itself.

Start with the DefaultTicketRegistry source that CAS uses to hold tickets in memory on a single CAS standalone server. Then add the writeObject statement (surrounded by the code to open and close the file) to create a checkpoint copy of all the tickets, and a corresponding readObject and surrounding code to restore the tickets to memory. The first thought was to do the writeObject to a network socket, because that was what all the other TicketRegistry implementations were doing. Then it became clear that it was simpler, and more generally useful, and a safer design, if the data was first written to a local disk file. The disk file could then optionally be transmitted over the network in a completely independent operation. Going first to disk created code that was useful for both standalone and clustered CAS servers, and it guaranteed that the network operations were completely separated from the Ticket objects and therefore the basic CAS function.

The first benchmarks turned out to be even better than had been expected, and that justified further work on the system.

CushyTicketRegistry and the Standalone Server

For a single CAS server, the standard choice is the DefaultTicketRegistry class which keeps the tickets in an in-memory Java table keyed by the ticket id string. Suppose you change the name of the Java class in the Spring ticketRegistry.xml file from DefaultTicketRegistry to CushyTicketRegistry (and add a few required parameters described later). Cushy was based on the DefaultTicketRegistry source code, so everything works the same as it did before, until you have to restart CAS for any reason. Since the DefaultTicketRegistry only has an in memory table, all the ticket objects are lost when CAS restarts and users all have to login again. Cushy detects the shutdown and using a single Java writeObject statement it saves all the ticket objects in the Registry to a file on disk (called the "checkpoint" file). When CAS restarts, Cushy reloads all the tickets from that file into memory and restores all the CAS state from before the shutdown. No user even notices that CAS restarted unless they tried to access CAS during the restart.

The number of tickets CAS holds grows during the day and shrinks over night. At Yale there are fewer than 20,000 ticket objects in CAS memory, and Cushy can write all those tickets to disk in less than a second generating a file around 3 megabytes in size. Other numbers of tickets scale proportionately (you can run a JUnit test and generate your own numbers). This is such a small amount of overhead that Cushy can be proactive.

So to take the next logical step, start with the previous ticketRegistry.xml configuration and duplicate the XML elements that currently call a function in the RegistryCleaner every few minutes. In the new copy of the XML elements, call the "timerDriven" function in the (Cushy)ticketRegistry bean every few minutes. Now Cushy will not wait for shutdown but will back up the ticket objects regularly just in case the CAS machine crashes without shutting down normally. When CAS restarts after a crash, it can load a fairly current copy of the ticket objects which will satisfy the 99.9% of the users who did not login in the last minutes before the crash.

The next step should be obvious. Can we turn "last few minutes" into "last few seconds". You could create a full checkpoint of all the tickets every few seconds, but now the overhead becomes significant. So go back to ticketRegistry.xml and set the parameters to call the "timerDriven" function every 10 seconds, but set the "checkpointInterval" parameter on the CushyTicketRegistry object to only create a new checkpoint file every 300 seconds. Now Cushy creates the checkpoint file, and then the next 29 times it is called by the timer it generates an "incremental" file containing only the changes since the checkpoint was written. Incremental files are cumulative, so there is only one file, not 29 separate files. If CAS crashes and restarts, Cushy reads the last checkpoint, then applies the changes in the last incremental, and now it has all the tickets up to the last 10 seconds before the crash. That satisfies 99.99% of the users and it is probably a good place to quit.

What about disaster recovery? The checkpoint and incremental files are ordinary sequential binary files on disk. When Cushy writes a new file it creates a temporary name, fills the file with new data, closes it, and then swaps the new for the old file, so other programs authorized to access the directory can safely open or copy the files while CAS is running. Feel free to write a shell script or Pearl or Python program to use SFTP or any other program or protocol to back up the data offsite or to the cloud.

Some people use JPATicketRegistry and store a copy of the tickets in a database to accomplish the same single server restart capability that Cushy provides. If you are happy with that solution, stick with it. Cushy doesn't require the database, it doesn't require JPA, and it may be easier to work with.

Before you configure a cluster, remember that today a server is typically a virtual machine that is not bound to any particular physical hardware. Ten years ago moving a service to a backup machine involved manual work that took time. Today there is VM infrastructure and automated monitoring and control tools. A failed server can be migrated and restarted automatically or with a few commands. If you can get the CAS server restarted fast enough that almost nobody notices, then you have solved the problem that clustering was originally designed to solve without adding a second running node.

You may still want a cluster.

CushyClusterConfiguration

If you use the JPATicketRegistry, then you configure CAS to know about the database in which tickets are stored. None of the nodes knows about the cluster as a whole. The "cluster" is simply one or more CAS servers all configured to backup tickets into the same database.

If you use Ehcache or one of the other object replication "cache" technologies, then there is typically an option to use an automatic node discovery mechanism based on multicast messages. That would be a good solution if you have only the one production CAS cluster, but it becomes harder to configure if you have separate Test and Development clusters that have to have their own multicast configuration.

It seems to be more reliable to configure each node to know the name and URL of all the other machines in the same cluster. However, a node specific configuration file on each machine is difficult to maintain and install. You do not want to change the CAS WAR file when you distribute it to each machine, and Production Services wants to churn out identical server VMs with minimal differences.

CushyClusterConfiguration (CCC) provides an alternative approach to cluster configuration, and while it was originally designed for CushyTicketRegistry it also works for Ehcache. Instead of defining the point of view of each individual machine, the administrator defines all of the CAS servers in all of the clusters in the organization. Production, Functional Test, Load Test, Integration Test, down to the developers desktop or laptop "Sandbox" machines.

CCC is a Spring Bean that is specified in the CAS Spring XML. It only has a function during initialization. It reads in the complete set of clusters, uses DNS (or the hosts file) to obtain information about each CAS machine referenced in the configuration, it uses Java to determine the IP addresses assigned to the current machine, and then it tries to match one of the configured machines to the current computer. When it finds a match, then that configuration defines this CAS, and the other machines in the same cluster definition can be used to manually configure Ehcache or CushyTicketRegistry.

CCC exports the information it has gathered and the decisions it has made by defining a number of properties that can be referenced using the "Spring EL" language in the configuration of properties and constructor arguments for other Beans. This obviously includes the TicketRegistry, but the ticketSuffix property can also be used to define a node specific value at the end of the unique ticketids generated by beans configured by the uniqueIdGenerators.xml file.

There is a separate page to explain the design and syntax of CCC.

Front End or CushyFrontEndFilter

If the Front End can be programmed to understand CAS protocol, to locate the ticketid, to extract the node identifying suffix from the ticketid, and to route requests to the CAS server that generated the ticket, then CAS does not have to wait for each Service Ticket ID to be replicated around the cluster. This is much simpler and more efficient, and the Cushy design started by assuming that everyone would see that this is an obviously better idea.

Unfortunately, it became clear that people in authority frequently had a narrow view of what the Front End should do, and that was frequently limited to the set of things the vendor pre-programmed into the device. Furthermore, there was some reluctance to depend on the correct functioning of something new no matter how simple it might be.

So with another couple of day's more programming (much spent understanding the multithreaded SSL session pooling support in the latest Apache HttpClient code), CushyFrontEndFilter was created. The idea here was to code in Java the exact same function that was better performed by an iRule in the BIG_IP F5 device, so that someone would be able to run all the Cushy programs even if he was not allowed to change his own F5.

CushyTicketRegistry and a CAS Cluster

Picking back up where we left off from the Standalone Server discussion, the names of each checkpoint and incremental files are created from the unique node names each server in the cluster, so they can all coexist in the same disk directory. The simplest Cushy communication option is "SharedDisk". When this is chosen, Cushy expects that the other nodes are writing their full backup and incremental files to the same disk directory it is using. If Cushy receives a request that the Front End should have sent to another node, then Cushy assumes some node or network failure has occurred, loads the other node's tickets into memory from its last checkpoint and incremental file in the shared directory, and then processes the request on behalf of the other node.

Of course you are free to implement SharedDisk with an actual file server or NAS, but technically Cushy doesn't know or care how the files got to the hard drive. So if you don't like real shared disk technology, you can write a shell script somewhere to wake up every 10 seconds copy the files between machines using SFTP or whatever file transfer mechanism you like to use. You could also put the 3 megabyte files on the Enterprise Service Bus if you prefer architecture to simplicity.

SharedDisk is not the preferred Cushy communication mechanism. Cushy is, after all, part of CAS where the obvious example of communication between computers is the Service Ticket validation request. Issue an HTTPS GET to /cas/serviceValidate with a ServiceTicket and get back a bunch of XML that describes the user. So with Cushy, one node can issue a HTTPS GET to /cas/cluster/getCheckpoint on another node and it gets back the current checkpoint file for that CAS server.

Obviously you need security for this important data. CAS security is based on short term securely generated Login and Service Tickets. So every time CAS generates a new checkpoint file it also generates a new "dummyServiceTicketId" that controls access to that checkpoint file and all the incrementals generated until there is a new checkpoint. So the full request is "/cas/cluster/getCheckpoint?ticket=..."  where the dummyServiceTicketId is appended to the end.

How do the other nodes get the dummyServiceTicketId securely? Here we borrow a trick from the CAS Proxy Callback. Each CAS node is a Web server with an SSL Certificate to prove its identity. So when a node generates a new checkpoint file, and a new dummyServiceTicketId, it issues an HTTPS GET to all the other configured CAS nodes using URL
/cas/cluster/notify?nodename=callernodename&ticket=(dummyServiceTicketId).

Thanks to https: this request will not transmit the parameters unless the server first proves its identity with its SSL Certificate. Then the request is sent encrypted so the dummyServiceTicketId is protected. Although this is a GET, there is no response. It is essentially a "restful" Web Service request that sends data as parameters.

Notify does three things:

  1. It tells the other node there is a new checkpoint ready to pick up immediately
  2. It securely provides the other node with the dummyServiceTicketId needed to read files for the next few minutes.
  3. It is a general declaration that the node is up and healthy. When a node starts up it sends its first /cluster/notify to all nodes with the &reboot=yes parameter to announce that it is live again.

Notify is only done every few minutes when there is a new checkpoint. Incrementals are generated all the time, but they are not announced. Each server is free to poll the other servers periodically to fetch the most recent incremental with the /cas/cluster/getIncremental request (add the dummyServiceTicketId to prove you are authorized to read the data).

Since these node to node communication calls are modeled on existing CAS Service Ticket validation and Proxy Callback requests, they are configured into CAS in the same place (in the Spring MVC configuration, details provided below).

Note: Yes, this sort of thing can be done with GSSAPI, but after looking into configuring Certificates or adding Kerberos, it made sense to keep it simple and stick with the solutions that CAS was already using to solve the same sort of problems in other contexts.

Are You Prepared?

Everything that can go wrong will go wrong. We plan for hardware and software failure, network failure, and disaster recovery. To do this we need to know how things will fail and how they will recover from each type of problem.

JPA is pretty straight forward. CAS depends on a database. To plan for CAS availability, you have to plan for database availability. At this point you have not actually solved any problem, but you have redefined it from a CAS issue to a database issue. Of course there is now an additional box involved, and you now have to look at network failures between the CAS servers and the database. However, now the CAS programmers can dump the entire thing on the DBAs and maybe they will figure it out. Unfortunately, you are probably not their most important customer when it comes to planning recovery.

The other CAS clustering techniques (JBoss Cache, Ehcache, Memcached) are typically regarded as magic off the shelf software products that take care of all your problems automatically and you don't have to worry about them. Again you haven't actually solved the problem, but now you really have transformed it into something you will never understand and so you just have to cross your fingers and hope those guys know what they are doing.

Even it you do not understand Java programming, CushyTicketRegistry performs a sequence of steps described here that you can understand. It writes a file on disk, and from that point on everything is file transfer. You can use the built-in Web support, or replace it with something else. From that point on every type of node failure or network failure produces predictable behavior. Since the file transfer is being retried periodically, every type of hardware recovery also produces predictable results. This is something you can understand and take into consideration when you plan out the scenarios.

Why another Clustering Mechanism?

You can use JPA, but CAS doesn't really have a database problem.

  • CAS tickets all timeout after a number of hours. They have no need for long term persistence.
  • There are no meaningful SQL operations in CAS. Nobody will generate reports based on tickets.
  • CAS has no transactional structure or need for a conventional commit operation.

JPA also weaves its own generated code into the methods exposed by the objects it manages. This causes the application (CAS) to fail in unpredictable and unavoidable ways if the database goes down or if network access to the database is interrupted.

There are a number of non-database central object server technologies available. There are no existing CAS TicketRegistry implementations for any of them, and the central server remains a problem.

JBoss Cache has proven unreliable, and it is terribly complex to configure with multicast addresses and complex network timeout and other parameters.

Ehcache appears to be the most commonly used CAS replication technology. It is fairly simple to configure, and it uses RMI calls to transmit tickets, a built in Java technology that is about as simple as Cushy HTTP. It can store tickets on local disk. It is the obvious alternative to CushyTicketRegistry and deserves special consideration.

Ehcache Compared to CushyTicketRegistry

CushyClusterConfiguration will configure either EhcacheTicketRegistry or CushyTicketRegistry, so it is certainly no easier to configure one or the other.

Although the default configuration of Ehcache uses synchronous replication for Service Tickets, if you program the Front End (or add the CushyFrontEndFilter) to Ehcache in the same way described for CushyTicketRegistry, then you can use the same asynchronous replication for both Login and Service Tickets.

So the main difference between the two is that every 10 seconds or so Ehcache replicates all the tickets that have changed in the last 10 seconds, while Cushy transmits a file with all of the ticket changes since the last full checkpoint. Then every few minutes it generates a full checkpoint. So Ehcache transmits a lot less data. However, the cost of transmitting the extra data is so low that this may not matter if Cushy provides extra function.

Ehcache is a closed system that operates inside the CAS servers and exposes no external features. Cushy generates checkpoint and incremental files that are regular files on disk that can be accessed using any standard commands, scripts, or utilities.

Ehcache is designed to be a "cache". That is, it is designed to be a high speed, in memory or local disk, copy of some data that has a persistent copy off on some server. That is why it has a lot of configuration for "LRU" and object eviction, because it assumes that lost objects are reloaded from persistent storage. You can use it as a replicated in memory table, but you have to understand if you read the documentation that that is not its original design.

Ehcache replicates data transparently inside a large black box library. Cushy is a single source file of pure Java written to be easily understood. It is specifically designed to manage Tickets. Furthermore, there is a specific point in the code when files arrive and when they are being processed. These are places in the code where additional CAS specific logic can be added to handle special or future requirements.

Two examples in the form of a fable -

Suppose the Rapture happens and all your users have been good users and they are all transported to Heaven leaving their laptops and tablets behind. Activity ceases on the network, so all the other TicketRegistry systems have nothing to do. Cushy, however, is driven by the number of tickets in the Registry and not as much by the amount of activity. So it continues to generate and exchange checkpoint files until 8 hours after the Rapture when the logins all timeout.

Suppose (and this one we have all seen) someone doesn't really understand how applications are supposed to work with CAS, and they write their code so they get a new Service Ticket for every Web page the user accesses. CAS now sees a stream of requests to create and validate new Service Tickets. The other TicketRegistry systems replicate the Service Ticket and then immediately send a message to all nodes to delete the ticket they just created. Cushy instead just wakes up after 10 seconds and finds that all this create and delete ticket activity has mostly cancelled out. The incremental file will contain an increasing number of deleted Ticket IDs, until the next checkpoint resets it to empty and it starts growing again. If you turn on the option to ignore Service Tickets all together (because you don't really need to replicate them if you have programmed your Front End or added the Filter), Cushy can ignore this activity entirely.

Basic Principles

  1. CAS is very important, but it is also small and cheap to run.
  2. Emphasize simplicity over efficiency as long as the cost remains trivial.
  3. The Front End gets the request first and it can be told what to do to keep the rest of the work simple. Let it do its job.
  4. Hardware failure doesn't have to be completely transparent. We can allow one or two users to get a bad message if everything works for the other 99.9% of the users. Trying to do better than this is the source of most 100% system failures.

Ticket Chains (and Test Cases)

A TGT represents a logged on user. It is called a Ticket Granting Ticket because it is used to create Service and Proxy tickets. It has no parent and stands alone.

When a user requests it, CAS uses the TGT to create a Service Ticket. The ST points to the TGT that created it, so when the application validates the ST id string, CAS can follow the chain from the ST to the TGT to get the Netid and attributes to return to the application. Then the ST is discarded.

However, when a middleware application like a Portal supports CAS Proxy protocol, the CAS Business Logic layer trades an ST (pointing to a TGT) in and turns it into a second type of TGT (the Proxy Granting Ticket or PGT). The term "PGT" exists only in documents like this. Internally CAS just creates a second TGT that points to the login TGT.

If the Proxy application accesses a backend application, it calls the /proxy service passing the TGT ID and gets back a Service Ticket ID. That ST points to the PGT that points to the TGT from which CAS can find the Netid.

So when you are thinking about Ticket Registries, or when you are designing JUnit test cases, there are four basic arrangements to consider:

  1. a TGT
  2. a ST pointing to a TGT
  3. a PGT pointing to a TGT
  4. a ST pointing to a PGT pointing to a TGT

This becomes an outline for various cluster node failure tests. Whenever one ticket points to a parent there is a model where the ticket pointed to was created on a node that failed and the new ticket has to be created on the backup server acting on behalf of that node. So you want to test the creation and validation of a Service Ticket on node B when the TGT was created on node A, or the creation of a PGT on node B when the TGT was created on node A, and so on.

Front End Programming

Any cluster of Web Servers requires some sort of Front End device to screen and forward network traffic. Ten years ago this was a simple computer that normally assigned traffic to servers on a round robin basis. Today the primary function of many Front Ends is to protect the servers from Denial of Service attacks, attempts to brute force passwords, and other security problems. To do this, the device understands many common network protocols so it can do "deep packet" inspection. HTTP is probably the simplest of the protocols. A Front End will examine the URL, remove certain headers regarded as dangerous, and add headers of its own. It can select a specific server from the pool based on data in the request, although this is most commonly used to maintain "sessions" between a particular client and server.

Users at Yale know that CAS is "https://secure.its.yale.edu/cas". In reality, DNS resolves secure.its.yale.edu to 130.132.35.49 and that is a Virtual IP address (a VIP) on the BIG-IP F5. The VIP requires configuration, because the F5 has to hold the SSL Certificate for "secure.its.yale.edu" and manage the SSL protocol.

Yale decided to make it appear that other security applications appear to run on the secure.its.yale.edu machine, even though each application has its own pool of VMs. So the F5 has to examine the URL to determine if it begins with "/cas" and therefore goes to the pool of CAS VMs, of if it references a different application and pool. The F5 has to inspect and generate HTTP Headers if the real client IP address is passed on to a Web Server for processing.

This means that if CAS is going to use X.509 User Certificates as a non-interactive form of authentication, then all the configuration that would in a standalone server be managed by the X509 optional component of CAS has to be configured in the F5. This is required by SSL protocol, it is not CAS specific. There has to be a special list of "Trusted" Certificate Authorities from which User Certificates will be accepted. The browser has to be told that certificates are required, permitted, or not allowed. The signature in the submitted Certificate has to be validated against the Trusted CA list. The Certificate has to be ASN.1 decoded, and then the DN and/or one or more subjectAltNames has to be extracted, and they have to be turned into HTTP headers that can be forwarded to the application. The F5 has most of this programming built in, although the last step of creating headers has to be manually coded. By comparison, routing requests based on CAS ticketids is simple.

Routing requests to particular servers based on the content of request line and the headers is part of what generic Front End devices (not just the F5) call "Layer 5-7 routing". The internet routes messages between computers using Layer 4 routing (IP) but Front End devices select the last hop to the specific VM based on data and and understanding of the higher level protocols. For example, if a large university divided its CAS servers up by physically separated campuses, then people who normally go to one campus could be given an OU= in the DN of their X.509 User Certificate that would preferentially route CAS requests to the server or pool of servers for the home campus. Servers at other campus locations then provide offsite backup.

After the first request is randomly assigned to a Java J2EE server, subsequent requests can be sent back to the same server if the Front End understands JSESSIONID protocol. The Java server places a parameter called JSESSIONID in the first response to the browser, and the browser sends it back as a Cookie or as part of the URL. The F5 has built in programming to handle JSESSIONID, but that requires tables and is a lot more complex than CAS.

First, however, we need to understand the format of CAS ticketids because that is where the routing information comes from:

type - num - random - suffix

where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end is configured in the uniqueIdGenerators.xml file.

A typical XML configuration for a particular type of ticket (when you use Cushy) looks like this:

<bean id="ticketGrantingTicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator">
<constructor-arg index="0" type="int" value="50" />
<constructor-arg  index="1"  value="#{clusterConfiguration.getTicketSuffix()}" />
</bean>

The suffix value, which is the index="1" argument to the Java object constructor, is obtained using a Spring "EL" expression to be the TicketSuffix property of the bean named clusterConfiguration. This is the CushyClusterConfiguration object that scans the configured cluster definitions to determine which cluster the server is running in and what name and IP address it uses.  By directly feeding the output of clusterConfiguration into the input of the Ticket ID Generator, this approach makes configuration simple and ensures that all the machines come up configured properly. There is special logic in Cushy for an F5 which, for some reason, likes to identify hosts by the MD5 hash of the character representation of their IP address.

Every CAS request except the initial login comes with one or more tickets located in different places in the request. There is a sequence of tests and you stop at the first match:

  1. If the Path part of the URL is a validate request (/cas/validate, /cas/serviceValidate, /cas/proxyValidate, or /cas/samlValidate) then look at the ticket= parameter in the query string part of the URL
  2. Otherwise, if the Path part of the URL is a /cas/proxy request, then look at the pgt= parameter in the query string.
  3. Otherwise, if the request has a CASTGC cookie, then look at the cookie value.
  4. Otherwise, use the built in support if the request has a JSESSIONID.
  5. Otherwise, or if the node selected by 1-4 is down, choose any CAS node from the pool.

That is the code, now here is the explanation:

  1. After receiving a Service Ticket ID from the browser, an application opens its own HTTPS session to CAS, presents the ticket id in a "validate" request. If the id is valid CAS passes back the Netid, and in certain requests can pass back additional attributes. This request is best handled by the server that issued the Service Ticket.
  2. When a middleware server like a Portal has obtained a CAS Proxy Granting Ticket, it requests CAS to issue a Service Ticket by opening its own HTTPS connection to CAS to make a /proxy call. Since the middleware is not a browser, it does not have a Cookie to hold the PGT. So it passes that ticketid explicitly in the pgt= parameter. This request is best handled by the server that created the Proxy Granting Ticket.
  3. After a user logs in, CAS creates a Login TGT that points to the Netid and attributes and writes the ticket id of the TGT to the browser as a Cookie. The Cookie is sent back from the browser in any request to "https://secure.its.yale.edu/cas". After initial login, all requests with cookies are requests to issue a Service Ticket for a new application using the existing CAS login. This is best handled by the server that created the TGT.
  4. If there is no existing ticket, then the user is logging into CAS. This may be the GET that returns the login form, or the POST that submits the Userid and Password. Vanilla CAS code works only if the POST goes back to the same server than handled the GET. This is the only part of CAS that actually has an HttpSession.
  5. Otherwise, if there is no JSESSIONID then this is the initial GET for the login form. Assign it to any server.

Except for Case 4 during login, neither the browser, JBoss, CAS, or the F5 is maintaining a "session" as that term is commonly used, where requests from the same client always go to the same server and the server maintains an HttpSession object. Since the entire CAS function is based on creating and updating Ticket objects, each CAS request except the initial browser logon references a specific Ticket ID. By storing in the Ticket ID a field that easily identifies to the F5 the CAS server that created and owns the Ticket, CAS protocol now provides a relatively simple algorithm for routing requests to the best server. It is vastly simpler than other protocols that the F5 has built in because of their wide use.

The F5 understands HTTP requests and already has both expressions and logic to locate "the Path part of the URL", "the ticket= parameter in the query string", and "the CASTGC Cookie value". All that has to be coded is the comparison of these predefined items to test values, and an expression to extract the string that follows the third "-" character in a given ticket value.

CAS does not require the F5 to create any new table. The pool of servers associated with /cas is already part of the F5 configuration. The logic depends on the CAS protocol, which has been updated only three times since CAS was created, rather than the characteristics of any particular CAS release.

Although HTTP is a "stateless" protocol, an SSL connection is frequently optimized to be a longer term thing that keeps a session alive between requests. The SSL connects the browser or application to the Front End, and there is probably a separate SSL connection from the Front End to the CAS VM. A common option for Front Ends is to notice any long running SSL connection and use it to route requests to the same backend VM node. You must be sure that you do not select this option with Cushy and CAS. For Service Ticket validation requests to work, the routing decision has to be made separately for each request because different tickets have to be routed to different CAS VMs even though they came from the same application.

If you cannot convince your network administrators to do the programming in the Front End where it belongs, you can get the same result slightly less efficiently using the CushyFrontEndFilter.  Just add it as a Servlet Filter to the WEB-INF/web.xml file and it will do the same thing that the F5 is supposed to do.

What Cushy Does at Failure

It is not necessary to explain how Cushy runs normally. It is based on DefaultTicketRegistry. It stores the tickets in a table in memory. If you have a cluster, each node in the cluster operates as if it was a standalone server and depends on the Front End to route requests to the node that can handle them.

Separately from the CAS function, Cushy periodically writes some files to a directory on disk. They are ordinary files. They are protected with ordinary operating system security.

In a cluster, the files can be written to a shared disk, or they can be copied to a shared location or from node to node by an independent program that has access to the directories. Or, Cushy will replicate the files itself using HTTPS GET requests.

A failure is detected when a request is routed by the Front End to a node other than the node that created the ticket.

Because CAS is a relatively small application that can easily run on a single machine, a "cluster" can be configured in either of two ways:

  • A Primary server gets all the requests until it fails. Then a Backup "warm spare" server gets requests. If the Primary comes back up relatively quickly, then Cushy will work best if Front End resumes routing all request to the Primary as soon as it becomes available again.
  • Users are assigned to login to a CAS Server on a round-robin or load balanced basis. After a user logs in, the suffix on the login, proxy, or service tickets in the URL or headers of an HTTP request route the request to that server. 

Each CAS server in the cluster has a shadow object representing the TicketRegistry of each of the other nodes. In normal operation the CAS nodes exchange checkpoint and incremental files but they do not restore objects from those files to memory. This is called "Tickets On Request". The first time a request arrives for a ticket owned by another node, the getTicket request restores tickets into memory from the files for that node.

However, every new ticket Cushy creates belongs to the node that created it. During a node failure, the new Service Tickets or Proxy Granting Tickets created for users logged into the failed node are created by and belong to the backup node. They each get a ticket ID that has the suffix of the backup node. They live forever in the Ticket Registry of the backup node. They just happen to be associated with and point to a TGT in the shadow registry on the backup node associated with the failed login node.

So while the failed node is down, and even after it comes back up again, requests associated with tickets created by the backup node are routed to the backup node by the Front End. However, after the failed node comes back new requests for new tickets associated with the login TGT will go back to being processed by the original node.

Service Tickets are created and then in a few milliseconds they are deleted when the application validates them or they time out after a few seconds or minutes. They do not exist long enough to raise any issues.

Proxy Granting Tickets, however, can remain around for hours. So the one long term consequence of a failure is that the login TGT can be on one server, but a PGT can be on a different server that created it while the login server was temporarily unavailable. This requires some thought, but you should quickly realize that everything will work correctly today. In future CAS releases there will be an issue if a user adds additional credentials (factors of authentication) to an existing login after a PGT is created. Without the failure, the PGT sees the new credentials immediately. With current Cushy logic, the PGT on the backup server is bound to a point in time snapshot of the original TGT and will not see the additional credentials. Remember, this only occurs after a CAS failure. It only affects the users who got the Proxy ticket during the failure. It can be "corrected" if the end user logs out and then logs back into the middleware server.

Cushy 2.0 will consider addressing this problem automatically.

There is also an issue with Single Sign Out. If a user logs out during a failure of his login server, then a backup server processes the Single Log Out normally. Then when the login server is restored to operation, the Login TGT is restored from the checkpoint file into memory. Of course, no browser now has a Cookie pointing to that ticket, so it sits unused all day and then in the evening it times out and a second Single Sign Out process is triggered and all the applications that perviously were told the user logged out are not contacted a second time with the same logout information. It is almost unimaginable that any application would be written so badly it would care about this, but it should be mentioned.

While the login server is down, new Service Tickets can be issued, but they cannot be meaningfully added to the "services" table in the TGT that drives Single Sign Out. After the login server is restored, if the user logs out to CAS the only applications that will be notified of the logout will be applications that received their Service Tickets from the logon server. Cushy regards Single Sign Out as a "best effort" service and cannot at this time guarantee processing for ST's issued during a node or network failure.

Again, Cushy 2.0 may address this problem.

Cushy CAS Cluster

In this document a CAS "cluster" is just a bunch of CAS server instances that are configured to know about each other. The term "cluster" does not imply that the Web servers are clustered in the sense that they share Session objects (JBoss "clustering"). Nor does it depend on any other type of communication between machines. In fact, a Cushy CAS cluster could be created from a CAS running under Tomcat on Windows and one running under JBoss on Linux.

To the outside world, the cluster typically shares a common virtual URL simulated by the Front End device. At Yale, CAS is "https://secure.its.yale.edu/cas" to all the users and applications. The "secure.its.yale.edu" DNS name is associated with an IP address managed by the BIG-IP F5 device. It holds the certificate, terminates the SSL, then examines requests and based on programming called iRules it forwards requests to any of the configured CAS virtual machines.

Each virtual machine has a native DNS name and URL. It is these "native" URLs that define the cluster because each CAS VM has to use the native URL to talk to another CAS VM. At Yale those URLs follow a pattern of "https://vm-foodevapp-01.web.yale.internal:8443/cas". 

Internally, Cushy configuration takes a list of URLs and generates a cluster definition with three pieces of data for each cluster member: a nodename like "vmfoodevapp01" (the first element of the DNS name with dashes removed), the URL, and the ticket suffix that identifies that node (the F5 prefers the ticket suffix to the an MD5 hash of the IP address of the VM).

Configuration

In CAS the TicketRegisty is configured using the WEB-INF/spring-configuration/ticketRegistry.xml file.

In the standard file, a bean with id="ticketRegistry" is configured selecting the class name of one of the optional TicketRegistry implementations (JBoss Cache, Ehcache, ...). To use Cushy you configure the CushyTicketRegistry class and its particular parameters.

Then at the end there are a group of bean definitions that set up periodic timer driven operations using the Spring support for the Quartz timer library. Normally these beans set up the RegistryCleaner to wake up periodically and remove all the expired tickets from the Registry.

Cushy adds a new bean at the beginning. This is an optional bean for class CushyClusterConfiguration that uses some static configuration information and runtime Java logic to find the IP addresses and hostname of the current computer to select a specific cluster configuration and generate property values that can be passed on to the CushyTicketRegistry bean. If this class does not do what you want, you can alter it, replace it, or just generate static configuration for the CushyTicketRegistry bean.

Then add a second timer driven operation to the end of the file to call the "timerDriven" method of the CushyTicketRegistry object on a regular basis (say once every 10 seconds) to trigger writing the checkpoint and incremental files.

There is a separate page that describes CushyClusterConfiguration in detail.

 

You Can Configure Manually

Although CushyClusterConfiguration makes most configuration problems simple and automatic, if it does the wrong thing and you don't want to change the code you can ignore it entirely. As will be shown in the next section, there are three properties, a string and two Properties tables) that are input to the CusyTicketRegistry bean. The whole purpose of CushyClusterConfiguration is to generate a value for these three parameters. If you don't like it, you can use Spring to generate static values for these parameters and you don't even have to use the clusterConfiguration bean.

Other Parameters

Typically in the ticketRegistry.xml Spring configuration file you configure CushyClusterConfiguration as a bean with id="clusterConfiguration" first, and then configure the usual id="ticketRegistry" using CusyTicketRegistry. The clusterConfiguration bean exports some properties that are used (through Spring EL) to configure the Registry bean.

  <bean id="ticketRegistry" class="edu.yale.cas.ticket.registry.CushyTicketRegistry"
          p:serviceTicketIdGenerator-ref="serviceTicketUniqueIdGenerator"
          p:checkpointInterval="300"
          p:cacheDirectory=  "#{systemProperties['jboss.server.data.dir']}/cas"
          p:nodeName=        "#{clusterConfiguration.getNodeName()}"
          p:nodeNameToUrl=   "#{clusterConfiguration.getNodeNameToUrl()}"
          p:suffixToNodeName="#{clusterConfiguration.getSuffixToNodeName()}"  />

 The nodeName, nodeNameToUrl, and suffixToNodeName parameters link back to properties generated as a result of the logic in the CushyClusterConfiguration bean.

The cacheDirectory is a work directory on disk to which it has read/write privileges. The default is "/var/cache/cas" which is Unix syntax but can be created as a directory structure on Windows. In this example we use the Java system property for the JBoss /data subdirectory when running CAS on JBoss.

The checkpointInterval is the time in seconds between successive full checkpoints. Between checkpoints, incremental files will be generated.

CushyClusterConfiguration exposes a md5Suffix="yes" parameter which causes it to generate a ticketSuffix that is the MD5 hash of the computer host instead of using the nodename as a suffix. The F5 likes to refer to computers by their MD5 hash and using that as the ticket suffix simplifies the F5 configuration even though it makes the ticket longer.

There are other "properties" that actually turn code options on or off. Internally they are static variable that only appear to be properties of the CushyTicketRegistry class so they can be added to the ticketRegistry.xml file. The alternative would be to make them static values in the source and require you to recompile the source to make a change.

  • p:sharedDisk="true" - disables HTTP communication for JUnit Tests and when the work directory is on a shared disk.
  • p:disableTicketsOnRequest="true" - disables an optimization that only reads tickets from a checkpoint or incremental file the first time the tickets are actually needed.
  • p:excludeSTFromFiles="true" - this is plausibly an option you should use. It prevents Service Tickets from being written to the checkpoint or incremental files. This makes incremental files smaller because it is then not necessary to keep the growing list of ST IDs for all the Service Tickets that were deleted probably before anyone ever really cared about them.
  • p:useThread="true" - use a thread to read the checkpoint file from another CAS node. If not set, the file is read in line and this may slow down the processing of a new checkpoint across all the nodes.

How Often?

"Quartz" is the standard Java library for timer driven events. There are various ways to use Quartz, including annotations in modern containers, but JASIG CAS uses a Spring Bean interface to Quartz where parameters are specified in XML. All the standard JASIG TicketRegistry configurations have contained a Spring Bean configuration that drives the RegistryCleaner to run and delete expired tickets every so often. CushyTicketRegistry requires a second Quartz timer configured in the same file :

    <bean id="jobBackupRegistry" class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"

        p:targetObject-ref="ticketRegistry" p:targetMethod="timerDriven" />

    <bean id="triggerBackupRegistry" class="org.springframework.scheduling.quartz.SimpleTriggerBean"
      p:jobDetail-ref="jobBackupRegistry" p:startDelay="60000" p:repeatInterval="15000" />

The first bean tells Spring to call method "timerDriven" in the object configured with Spring bean name "ticketRegistry". The second bean tells Spring that after the first minute (letting things start up), make the call indicated in the first bean every 15 seconds. Since this is standard Spring stuff, the interval is coded in milliseconds.

The time interval configured here is the time between incrementals. The checkpointInterval parameter on the ticketRegistry bean sets the time (in seconds) between full checkpoints:

p:checkpointInterval="300"

So with these parameters, Cushy writes an incremental every 15 seconds and a checkpoint every 5 minutes. Feel free to set these values as you choose. Shorter intervals mean more overhead, but the cost is already so low that longer intervals don't really save much.

See the sample ticketRegistry.xml file for the complete configuration context.

Special Rules

Cushy stores tickets in an in-memory table. It writes tickets to a disk file with a single writeObject Java statement. It transfers files from machine to machine using an HTTPS GET. So far, everything seems to be rather simple. Cushy started that way, but then it became clear that there were a small number of optimizations that really needed to be made even if they added a slight amount of complexity to the code.

Notify

Once every 5-15 minutes a node generates a new full checkpoint file. It also generates a new dummy ServiceTicketId that acts as the password that other nodes will present to request the files over HTTPS. It then does a "Notify" operation. It generates a HTTPS GET to the /cas/cluster/notify URL on every other CAS node in the cluster. This request is routed by Spring MVC to the CacheNotifyController class provided by the Cushy package. A node also does a Notify immediately after it reboots to inform the other nodes that it is back up and to provide them with the password needed to communicate until the next checkpoint.

The Notify goes to every node in the cluster at its configured URL. The URL is assumed to be "https:" so the SSL Certificate in the other node verifies that it is the correct machine authorized to receive the data.

However, when a node receives what looks like a Notify it cannot verify its source. This is not a big problem because the first order of business is to read the new checkpoint file from the node sending the Notify, and to read the file it uses the configured URL for that node in the cluster definition, and since that URL is "https:" it will only work if the other node has a Certificate proving its identity, and if the other node accepts the secret dummy ServiceTicketId sent in the Notify then the loop has been closed. Both machines communicated over configured URLs. Both verified their identity with a Certificate. All data was encrypted with SSL. The ticket send on the Notify was validated when the checkpoint file was returned correctly.

Because Notify is sent when CAS boots up, it is an indication that the node is "healthy" that resets any flag indicating that the node is "sick". This does not, however, prevent the other nodes from reacting if they continue to receive requests or exceptions indicating a problem. When Cushy gets an indication of a problem it sets a flag. It then continues of operate assuming the problem is still there until it gets a Notify from the node. After the Notify, Cushy does not assume that there is a continuing problem, but it will respond appropriately if one is detected.

In a SharedDisk situation (see below) there is no HTTP and therefore no /cluster/notify call. Instead, the timerDriven routine checks the Last Modified date on the other node's checkpoint file. When it changes, it performs a subset of the full processNotify operations to reset flags and mark the other server healthy.

Tickets on Request

The simplest and therefore the initial logic for Cushy read a checkpoint or incremental file from another node and immediately "deserialized" it (turned the file into a set of objects) and updated the tickets in the secondary registry object associated with the other node. This is clean and it generates log messages describing the contents of each file as it arrives, which reassures you that the file contains the right data.

However, during the 99.9% of the time when the nodes are running and the network is OK, this approach approximately doubles the amount of overhead to run Cushy. Turning the file back into objects is almost as expensive as creating the objects in the first place. Worse, every time you get a new checkpoint file you have to discard all the old objects and replace them with new objects, which means the old objects have to be garbage collected and destroyed.

This was one place where simplicity over efficiency seemed to go too far. The alternative was to fetch the files across the network, but not to open or read them until some sort of failure routed a request for a ticket that belonged to the other node. Then during normal periods the files would be continuously updated on disk, but they would never be opened until one of the objects they contained was needed.

When a node fails, a bunch of requests for that node may be forwarded by the Front End to a backup node almost at the same time. The first request has to restore all the tickets, but while that is going on the other requests should wait until restore completes. I a real J2EE environment this sort of coordination is handled by the EJB layer, but CAS uses Spring and has no EJBs.

The obvious way to do this is with a Java "synchronized" operation, which acquires a lock while the tickets are being restored from disk to memory. Generally speaking this is not something you want to do. Generally the rule is that you should never hold any lock while doing any type of I/O. Since we know this can take as long as a second to complete, it is not the sort of thing you normally want to do locked. However, the only operations that are queuing up for the lock are requests for tickets owned by the secondary (failed) node, and the readObject that is going to restore all the tickets will end, successfully or with an I/O exception, and then those requests will be processed.

This optimization saves a tiny amount of CPU, but it is continuous across all the time the network is behaving normally. If you disable it, and there is a parameter to disable it on the ticketRegistry bean of the ticketRegistry.xml Spring configuration file, then each checkpoint file will be restored after a Notify is received (from the Notify request thread) and each incremental file will be restored after it is read by the Quartz thread that calls timerDriven, so requests never have to synchronize and wait. Of course, if the request proceeds after a file has been received but before it has been restored as new tickets, the request will be processed against the old set of tickets. That is the downside of impatience.

When using "Tickets on Request", there are two basic rules. First, you don't have complete control unless you are synchronized on the Secondary Registry object that corresponds to that node and set of files. Secondly, in order to work in both HTTPS and SharedDisk mode, the processing is coordinated by the modified date on the files. When a file is turned into object in memory, then the objects have the same "modified date" as the file that created or updated them. When the file modified date is later than the objects modified date, then the objects in memory are stale and the file should be restored at the next request.

Sanity check: In a real Shared Disk mode the timestamps on the files are set by the file system, either of the file server or the local disk during HTTP processing (when the /cas/cluster/getCheckpoint or /cas/cluster/getIncremental operation completes). In either case they are set by the same clock. The typical 10 second interval between events (and even a much smaller interval) is much larger than the clock resolution. The important thing here is that we are always comparing one file timestamp with another file timestamp from the same source. This part of the code never uses a timestamp from the local System, so we don't have to worry if clocks are out of sync across systems.

However, there are two potential sources of lastModifiedDate for a file. One is a value saved in memory the last time we looked at the file. The other is do go to the disk directory and get the current value. Even if the directory is fast, going there is still I/O, and you don't want to do I/O while running synchronized (holding a lock) and in other cases it does delay things a bit. When running in HTTP (not SharedDisk) mode the files don't get onto the disk unless they are read, and the end of reading the files is to update the lastModified date in memory. In SharedDisk mode the timerDriven routine (every 10 seconds or so) checks the current lastModified date from the directory. So the question is (read the code to find out the answer) when we do a getTicket in SharedDisk mode, do we stop and get the current lastModified value for both files (a lot of delay and overhead at a critical moment) or do we take the tickets we have and let the timerDriven routine decide when it is time to load a fresher set of tickets?

Generally an incremental file if it exists should always be later than a checkpoint. If both files are later than the objects in memory, always restore the checkpoint first.

Now for a chase condition that is currently declared to be unimportant. Assume that "Tickets on Request" is disabled, so tickets are being restored as soon as the file arrives. Assume that there are a large number of tickets so restoring the checkpoint (which is done in one thread as a result of the Notify request) takes longer than the number of seconds before the next incremental is generated. The incremental is small, and it is read by the timerDriven thread independent of the Notify request. So it is possible if these two restores are not synchronized against each other that this first incremental will be applied to the old objects in memory instead of the new objects still being restored from the checkpoint. Nothing really bad happens here. The New Tickets in the incremental are certainly newer than the old objects, and the Deleted Tickets in the incremental certainly deserve to be deleted, and if the first incremental is applied to the old set of tickets and doesn't update the objects created by the new checkpoint, then wait for the second incremental which is cumulative and will correct the problem. So the issue is not worth adding synchronization to avoid.

SharedDisk

The SharedDisk parameter is typically specified in the ticketRegistry.xml Spring configuration file. It turns off the Cushy HTTP processing. There will be no Notify message, and therefore no HTTP fetching of the checkpoint or incremental file. There is no exchange of dummy ServiceTicketId for communication security because there is no communication. It is used in real SharedDisk situations and in Unit Test cases.

Since there is no notify, the timerDriven code that generates checkpoint and incremental files has to check the last modified timestamp on the checkpoint file of any other node. If the timestamp changes, then that triggers the subset of Notify processing that does not involve HTTP or file transfers (like the resetting of flags indicating possible node health).

Cold Start Quiet Period

When CAS starts up and finds no previous checkpoint file in its work directory, there are no tickets to restore. This is a Cold Start, and it may be associated with a change of CAS code from one release to another with possible changes to the Ticket object definitions. A cold start has to happen at one time and it has to restart all the servers the cluster. You do not want one server running on old code while another server runs on the new code. To give the operators time to make the change, after a cold start CAS enters the Cold Start Quiet Period which lasts for 10 minutes (built into the source). During this period it does not send or respond to HTTP requests from other nodes. That way the nodes cannot exchange mismatched object files.

Healthy

When CAS receives an HTTP GET I/O error attempting to contact or read data from another node, it marks that node as "unhealthy" It then waits for a Notify from the node, and then tries to read the new checkpoint file.

Without the "healthy" flag, when a node goes down all the other nodes would attempt every 10 seconds or so to read a new incremental file but the HTTP connect would time out. Adding a timeout every 10 seconds seems like a waste, and the Notify process will tell us soon enough when it is time to reconsider the health of the node.

Note that Healthy deals with a failure of this server to connect to a node while TicketsOnRequest is triggered when the Front End cannot get to the node and sends us a request that belongs to the other node. If a node really goes down, both things happen at roughly the same time. Otherwise, it is possible for just one type of communication to fail while the other still works.

Usage Pattern

Users start logging into CAS at the start of the business day. The number of TGTs begins to grow.

Users seldom log out of CAS, so TGTs typically time out instead of being explicitly deleted.

Users abandon a TGT when they close the browser. They then get a new TGT and cookie when they open a new browser window.

Therefore, the number of TGTs can be much larger than the number of real CAS users. It is a count of browser windows and not of people or machines.

At Yale around 3 PM a typical set of statistics is:

Unexpired-TGTs: 13821
Unexpired-STs: 12
Expired TGTs: 30
Expired STs: 11

So you see that a Ticket Registry is overwhelmingly a place to keep logon TGTs (in this statistic TGTs and PGTs are combined).

Over night the TGTs from earlier in the day time out and the Registry Cleaner deletes them.

So generally the pattern is a slow growth of TGTs while people are using the network application, followed by a slow reduction of tickets while they are asleep, with a minimum probably reached each morning before 8 AM.

If you display CAS statistics periodically during the day you will see a regular pattern and a typical maximum number of tickets in use "late in the day".

Translated to Cushy, the cost of the full checkpoint and the size of the checkpoint file grow over time along with the number of active tickets, and then the file shrinks over night. During any period of intense login activity the incremental file may be unusually large. If you had a long time between checkpoints, then around the daily minimum (8 AM) you could get an incremental file bigger than the checkpoint.

CAS Ticket Objects Need to be Fixed

CAS has some bugs. They are very, very unlikely to occur, but they are there. Cushy can't fix them because they are in the Ticket classes themselves.

ConcurrentModificationException

First, the login TGT object has some collections. One collection gets a new entry every time a Service Ticket is created and it is used for Single Sign Off. In CAS 4, a new collection is used to handle multiple factors of authentication. If two requests arrive at the same time to generate two Service Tickets on the same TGT, then one ST is created and is queued up by existing TicketRegistry implementations to be replicated to other nodes. Meanwhile the second Service Ticket is being created and is adding a new entry to the Single Sign Off collection in the TGT.

CAS 3 was sloppy about this. CAS 4 adds "synchronized" statements to protect itself from everything except the ticket replication mechanism. Once the ST and TGT are queued up to be replicated that can happen at any time, and if it happens while the second Service Ticket is modifying the TGT then the third party off the shelf software replication system will throw a ConcurrentModificationException somewhere deep in the middle of its code. Will it recover properly?

Cushy cannot itself solve a problem in the Ticket classes, but it does allow you to safely add to the TicketGrantingTicketImpl class the method that fixes the problem:

private synchronized void writeObject(ObjectOutputStream s) throws IOException { s.defaultWriteObject();}

Private Copy of the Login TGT

JPA handles the entire collection of tickets properly.

The other replication systems use writeObject on what they think is a single ticket object. Unfortunately, Service Tickets and Proxy Granting Tickets point to the login TGT, and when you do a writeObject (serialize) them, Java generates a copy of the TGT which is generally sent over the network and is received at the other node as a pair of ticket objects.

You can verify that none of the TicketRegistry implementations fix this problem, because CAS has made all the important fields of the Ticket object private with no exposed methods that allow any code to fix it.

In CAS 3 it is not a problem because the copy of the TGT works just as well as the real TGT during CAS processing, and Service Tickets are used or time out so quickly it doesn't matter. In CAS 4 this may become a problem because the TGT can change in important ways after it is created and the copy of the TGT connected to a replicated Proxy Granting Ticket becomes stale and outdated.

Cushy avoids this problem because the periodic checkpoint file captures all the tickets with all their relationships. Limited examples of this problem can occur around node failure, but for all the other TicketRegistry solutions (except JPA) this happens all the time to all the tickets during normal processing.

JUnit Testing

Cushy includes a JUnit test that runs all the same cases that the DefaultTicketRegistry JUnit test runs.

It is not possible to configure enough of a Java Servlet Web server to test the HTTP Notify and file transfer. You have to test that on a real server. JUnit tests run in SharedDisk mode, where two objects representing the TicketRegistry objects on two different nodes in the cluster both write and read files from the same disk directory.

The trick here is to create two Primary CusyTicketRegistry instances with two compatible but opposite configurations. Typically one Primary object believes that it is node "casvm01" and that the cluster consists of a second node named "casvm02", while the other Primary object believes that it is node "casvm02" in a cluster with "casvm01".

There are two test classes with entirely different strategies.

CushyTicketRegistryTest.java tests the TicketRegistry interface and the Cushy functions of checkpoint, restore, writeIncremental, and readIncremental. You can create a single ticket or a 100,000 TGTs. This verifies that the tickets are handled correctly, but it does not test CAS Business Layer processing. Intialization creates a new empty TicketRegistry for each test, so it is possible to test that a sequence of operations produces an expected outcome.

CushyCentralAuthenticationServiceImplTests.java is an adaptation of the CentralAuthenticationServiceImpl test class from cas-server-core that simulates CAS Business Logic on two nodes across a failover. As with the original code, it uses Spring support for JUnit testing. It has a single resource file named applicationContext.xml that configures a stripped down CAS using versions of the same XML used to configure real CAS. In this case, however, there are two "ticketRegistry" beans that use two "clusterConfiguration" beans for nodes "casvm01" and "casvm02".

Warning: To make this test case work you need a line in your /etc/hosts or your c:\Windows\system32\drivers\etc\hosts" file that maps the names "casvm01" and "casvm02" to the loopback address, as in:

127.0.0.1   casvm01,casvm02

Without this the two CushyClusterConfiguration beans cannot be tricked into regarding the one machine as if it was two nodes.

Using this test class, the Spring configuration is done first and then each test is run. As a result the two CushyTicketRegistry objects are not reinitialized between tests and the objects created in previous tests are left behind at the start of the next test. However, because the operations here involve the Business Logic layer, you can perform tests like:

Create credentials on casvm02
Create a TGT with the credentials on casvm02
Simulate a failure of casvm02, from now on everything is casvm01
Create a ST using the TGT ID of the casvm02 TGT.
Use the ST to create a PGT.
Create a new ST using the PGT just created.
Validate the ST. Make sure that the netid that comes back matches the credentials supplied to casvm02.

 

  • No labels