Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

CAS is a Single SignOn solution. Internally the function of CAS is to create, update, and delete a set of objects it calls "Tickets". A Logon Ticket is created whenever a user logs in to CAS. It is used to remember the userid, and a generated string used to identity and locate the ticket is written back as a cookie to the logged in browser. When the browser uses this login to access an application, CAS issues a temporary Service Ticket that ties the application URL to the Login Ticket. These ticket objects are stored in a plugin component called a TicketRegistry. A standalone server stores tickets in memory, but a cluster of CAS servers has to share the tickets by replicating copies of them from the server that created the ticket to other servers in the cluster.

Four years ago Yale implemented a "High Availability" CAS cluster using JBoss Cache to replicate tickets. After that, the only CAS crashes were caused by failures of the ticket replication mechanism. We were disappointed that a mechanism nominally designed to improve availability should be a source of failure. We considered switching from JBoss Cache to an alternate library performing essentially the same service, but it was not clear that any other option would solve all the problems.

General object replication systems are necessary for shopping cart applications that handle thousands of concurrent users spread across a number of machines. That is not really the CAS problem. CAS has a relatively light load that could probably be handled by a single server, but it needs to be available all the time, even during disaster recovery when there may be unexpected network communication problems. It also turns out that CAS tickets violate some of the restrictions that general object replication systems place on application objects.

CushyTicketRegistry is a new alternative you can plug into the TicketRegistry component of CAS. It adds useful availability features to a single standalone CAS server, but it also provides an entirely different approach to clustering two or more CAS servers for reliability. It is simple because it is specifically designed to the requirements of CAS and no other application.

CAS is based on the Spring Framework, which means that internal components are selected and connected to each other using XML text files. The ticketRegistry.xml file has to configure some object that implements the TicketRegistry interface. The simplest class, which keeps tickets in memory on a single standalone server, is called DefaultTicketRegistry.

Suppose that you start with the standard single server CAS configuration. but change the class name in the XML file from DefaultTicketRegistry to CushyTicketRegistry (and add a few required parameters described later). Everything works the same as before, until you shut down the CAS server. The old TicketRegistry loses all the tickets, and therefore everyone has to login again. Cushy detects the shutdown and saves all the ticket objects to a file on disk, using a single Java writeObject statement.  Unless that file is deleted while CAS is down, then when CAS restarts Cushy loads all the tickets from that file into memory and then CAS picks up where it left off. Users do not have to login again, and no user notices that CAS rebooted unless they tried to access CAS while it was down.

It turns out that the largest number of tickets normally encountered at Yale could be written to disk in less than a second, and the file was only 3MB. That is such a small cost that you don't have to wait just until shutdown. If you examine the standard default ticketRegistry.xml configuration file you will see a few extra XML elements that configure a timer driven component called the RegistryCleaner that runs periodically to delete expired tickets. If you copy these XML statements to call the "timerDriven" method of CushyTicketRegistry, then in the regular intervals you just selected Cushy will write the same complete copy of the tickets to disk. Now if the CAS server crashes instead of shutting down normally, Cushy can restore the most recently written set of tickets, missing only the ones created between the last backup and crash.

Backing up all the tickets to disk doesn't use a lot of processing time, but it is probably not something you would do every 5 seconds. So Cushy provides a much quicker alternative that can provide complete coverage. Cushy can track the new and deleted tickets. It can write a second file called the "incremental" that contains only the new or deleted tickets since the last full checkpoint file with all the tickets was written. Typically the incremental takes only a few milliseconds to write, so you can write it as frequently as you need. Then if CAS crashes it uses one readObject to read the last full checkpoint, and then another operation to read the last incremental and add back in the tickets from the last seconds before the crash.

Occasionally the source of the crash was a problem that prevents bringing CAS up on the same host computer. The checkpoint and incremental files are plain old ordinary disk files. They can be stored on local disk on the CAS machine, or they can be stored on a file server, or on a SAN, NAS,  or some other highly available disk technology. The farther away they are the safer they are in terms of disaster recovery, but then the elapsed time to write the 3 megabytes may be a few milliseconds longer. So rather than slowing CAS down, you should let it write the file to local disk, then a shell script or other program can copy the file a second later to a remote safer location. If the CAS machine is unbootable, you can bring up a copy of CAS from the remote backup on other hardware.

The idea of using a cluster for availability made sense ten years ago when servers were physical machines and recovery involved manual intervention. Today servers run on VMs in a highly managed environment and backup VMs can be spun off automatically. It may be possible to design a system so that the backup comes up automatically and so quickly that you don't need a cluster at all. Cushy supports this profile and strategy.

If you insist on creating a cluster of CAS servers, then you should consider some differences between modern technology and the machine room of ten years ago when conventional CAS cluster support was developed.

Multiple CAS nodes will still be run in the VM infrastructure. With the CAS server divorced from physical hardware, nodes should not remain down as long.

Original CAS clustering assumed that the network Front End was fairly dumb. Typically it would take requests for the common CAS URL and distribute them on a round robin basis to the available servers. So CAS clustering had to replicate ticket status almost immediately to all the nodes before the next request came in and was randomly assigned to an unpredictable node. Modern Front End machines, such as the BIG-IP F5, are much smarter and they can be programmed with enough understanding of the CAS protocol so that they only round robin the initial login of new users. After that, they should be able to route requests for your tickets, whether from the browser or from applications validating a ticket, to the node you logged into which is also the node that created the ticket. Assuming a modern Front End, tickets only have to be replicated to protect against system failure, and that allows replication to be measured in seconds instead of milliseconds (hence the term Lazy Replication).

The CushyClusterConfiguration class makes it simple to configure more than one CAS server in a cluster. It makes sure that every server has a unique name, that all members of the cluster know the names and network locations of the other members, and that some version of these names is appended to every ticketid so the Front End can route requests properly. It then feeds this cluster information to the CushyTicketRegistry object.

With cluster data, the TicketRegistry comes up as before, but it now creates a secondary registry object for every other node in the cluster. With the simplest option (SharedDisk) these secondary objects simply sit until one of the other CAS servers in the cluster fails and the Front End starts routing requests belonging to to the failed server to other members of the cluster. When the registry receives a request for a ticket that belongs to another server, it restores the tickets belonging to that server from the disk to the secondary object associated with the failed cluster member. It then processes the request on behalf of the failed server. The details will be explained below.

If you don't want to use shared disk, there are two alternatives. Cushy provides an HTTPS solution. After all, CAS runs on Web Servers. Web Servers are very good about sending the current copy of small files over the network to clients. The checkpoint file is small, and the incremental file is smaller. Everyone understands how an HTTP GET works. So unless you configure Shared Disk, Cushy running in cluster mode uses HTTP GET to retrieve a copy of the most recent full checkpoint or incremental file from every other node in the cluster and put the copy on the local hard disk of the machine.

You may now have realized that you do not actually need to use either real Shared Disk or Cushy HTTPS. Every 10 seconds or so Cushy writes one of two files to a directory on local disk. You can write your own program in any language you prefer to wake up every 10 seconds, and if you look at the time stamp on the files you will get an exact time to synchronize your program to the Cushy activity, and after the file has been changed your program can write it somewhere the other nodes can find it using anything from FTP on the simple end to an Enterprise Service Bus on the more exotic end. These are just files and figuring out how to distribute them around the network is fairly routine.

When a CAS node crashes, the other nodes use the most recent file they received to load up tickets and handle requests for the failed node. They do not care how the files got to them.

So what happens if a router breaks the connection between the front end and one of the CAS servers? Suppose a fiber optic connection between data centers goes down for a hour before the traffic can be rerouted, separating one CAS server from another? With magic black box technology that is just supposed to take care of all the problems, you don't really know exactly what is going to happen. Cushy is explained completely using HTTP, or a shared disk technology of your choice, or a file transfer program you decide to write. This document still has to fill in a little more detail, and a moderately skilled Java programmer can read the source. With Cushy you will know exactly how it works and therefore exactly what it will do in any failure situation.

Now the bad news. Current CAS has some bugs. It was not written "properly" to work with the various ticket replication mechanisms. It has worked well enough in the past, but CAS 4 introduces new features and in the future it may not behave as expected. It is not possible to fix everything in the TicketRegistry. A few changes may need to be made in the CAS Ticket classes. So Cushy does not fix the bugs itself, but it does eliminate the false reliance of "the magic black box of off the shelf software" that people imagined was going to do more than it could reasonably be expected to do.

1) Any system that seeks to replicate tickets has a concurrency problem if there are multiple threads (like the request threads maintained by any Web Server) that can change the content of an object while another thread has triggered replication of the object. CAS has some collections in its TicketGrantingTicket object that can be changed by one Web request while another request is trying to serialize the ticket for replication to another system. CAS 3 was sloppy about this. CAS 4 added the "synchronized" attribute to methods so at least the CAS API is protected from threading problems. However, when tickets get passed to a black box cache mechanism for replication, then under the covers they are "serialized" to a stream of bytes, and serialization is not synchronized unless you provide a trivial change to protect it and that change is not yet in CAS 4.0. As a result, any of the ticket replication technologies has a very, very small chance of throwing a ConcurrentModificationException. Cushy doesn't solve this problem yet, because it doesn't change the Ticket classes that have the bug, but it does provide a small amount of transparent pure Java code where a fix can be validated.

2) Any system that replicates tickets using serialization gets not just the object they are trying to serialize but also a copy of any other objects it points to. In CAS a Service or Proxy ticket points to a TGT, and when you try to serialize one of them you get a copy of the TGT dragged along under the covers and then recreated at the other end when the data is turned back into a Ticket object. That didn't matter in CAS 3 because the TGT didn't change in any important way after it was created. This may not be sufficient in CAS 4 when people start to add additional factors of authentication to an existing logon.

3) It is not possible to fix the previous problem in the TicketRegistry alone because the Ticket classes do not expose a method that allows the Registry to reconnect the copy of the Proxy or Service Ticket to the real TGT after it arrives. Cushy mostly "solves" the problem because every full checkpoint (every 5 minutes or so) fixes the broken pointers, but Cushy is still stuck with the problem in tickets added by incrementals. That is a very small percent of the tickets (while with other replication options all the tickets have broken pointers that stay broken), but adding a method that allows it to fix the tickets would be helpful.

The big difference here is that Cushy is designed 100% to satisfy the needs of CAS, and so we can discuss and fix those specific problems. The larger off the shelf generic libraries provide no place to fix problems specific to CAS and up to this point nobody seems to have noticed or fixed the problems.

Summary of CAS Clustering

(For those unfamiliar with the CAS system)

CAS is a Single SignOn solution. Internally, it creates a set of objects called Tickets. There is a ticket for every logged on user, and short term Service Tickets that exist while a user is being authenticated to an application. The Business Layer of CAS creates tickets by, for example, validating your userid and password in a back end system like Active Directory. The tickets are stored in a plug in component called a Ticket Registry. The tickets are the only data CAS maintains about its users or previous activity.

For a single CAS server, the Ticket Registry is just a in memory table of tickets indexed by the ticket ID string. When more than one CAS server is combined to form a cluster, then an administrator chooses one of several optional Ticket Registry solutions that allow the CAS servers to share the tickets.

One clustering option is to use JPA, the standard Java service to map objects to tables in a relational database. All the CAS servers share a database, which means that any CAS node can fail but the database has to stay up all the time or CAS stops working. Other solutions use generic object "caching" solutions (Ehcache, JBoss Cache, Memcached) where CAS puts the tickets into what appears to be a common container of Java objects and, under the covers, the cache technology ensures that new tickets are copied to all the other nodes.

JPA makes CAS dependent on a database. It doesn't really use the database for any queries or reports. You can use any database, but the database is a single point of failure. At Yale CAS is nearly the first thing that has to come up during disaster recovery, but if it uses JPA then you have to bring up the database (or have a special standalone CAS configuration for disaster recovery only). If you already have a 24x7x365 database managed by professionals who can guarantee availability, this is a good solution. If not, then this is an insurmountable prerequisite for bringing up an application like CAS.

The various "cache" (in memory object replication) solutions should also work. Unfortunately, some have massively complex configuration parameters with multicast network addresses and timeout values to determine node failure.They also tend to be better at detecting a node that is dead than they are at dealing with nodes that are sick and accept a message but then never really get to processing it and responding. They operate entirely in memory, so at least one node has to remain up while the others reboot in order to maintain the content of the cache. While node failure is well defined, the status of objects is ambiguous if the network is divided into two segments by a linkage failure, the two segments operate independently for a while, and then connection is reestablished.

Cushy is a cute name that roughly stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale".

The name explains what it does. Java has a built in operation named writeObject that writes a binary ("serialized") version of Java objects to disk. You can use it on a single object, but if you pass it a list or table of objects then it copies everything in the list and captures all the relationships between the objects. Later on you can use readObject from the same program, from a different JVM, or from a different computer and restore to memory an exact copy of the original list or table and all the objects it contains. This is a very complex process, but Java handles all the complexity inside the writeObject statement.

Comparison of Cushy and previous cluster technologies:

  • Existing cluster technologies maintain the image of a single pool of shared tickets. Cushy exploits modern programmable Front End network devices (such as the BIG-IP F5) to distribute initial CAS logons across different members of the cluster, but then to route subsequent CAS requests to the node that handled the specific user logon unless that node crashes. Each Cushy node maintains its own set of tickets.
  • Existing cluster technologies try to replicate individual tickets (although the nature of Java's writeObject drags along copies of additional associated tickets). Cushy replicates a batch of tickets at regular time intervals (say every 10 seconds) and less frequently it replicates a copy of the entire collection of tickets.
  • Existing cluster technologies use complex logic and databases or complex network configuration. Cushy uses HTTP that everyone understands, although you can replace this with shared files or your own trivial programs. As a result you can know how things work and how they will respond to any type of network or system failure.
  • Existing cluster technologies require a cluster. Cushy does something useful on a single machine, and its clustering capability is simply an extension of that simple design.
  • Existing cluster technologies are general purpose off the shelf libraries designed to handle any application. Cushy was written to handle CAS tickets. There are unresolved problems when CAS tickets are replicated using generic replication. In its initial distribution as a TicketRegistry component, Cushy cannot solve bugs in other CAS components, but because it exposes 100% of the logic as simple Java it provides the framework to resolve these problems when you start to use the new features of CAS 4.
  • Cushy is probably less efficient than other technologies, but if it uses less that 1% of one core of a modern server then, given the relative importance of CAS in most institutions, reducing that to a quarter of 1% is not worthwhile if you have to give something up to get the efficiency.

 

Cushy is based on four basic design principles:

  1. CAS is very important, but it is small and cheap to run.
  2. Emphasize simplicity over efficiency as long as the cost to run remains trivial.
  3. Assume the network front end is programmable.
  4. Trying for perfection is the source of most total system failures. Allow one or two users to get a temporary error message when a CAS server fails.

A Bit More Detail on CAS Tickets

When the user logs in, CAS creates a Logon Ticket (the Ticket Granting Ticket or TGT because it can be used to generate other tickets). You can usually get away with believing that the TGT contains the login userid and user attributes, but there is really a chain of objects. The TGT points to an Authentication that points to a Principal that points to the username. The TGT can also contain a collection of Attributes used to generate SAML responses. In most cases you can ignore this chain of objects, unless you are writing or trying to understand a JUnit Test.

In CAS 3 the TGT is fairly stable once it is created. There is a table of pairs of Service Ticket ID strings and Service objects used by Single Sign Off that gets a new entry every time a Service Ticket is created, but otherwise the TGT doesn't change. With CAS 4 things threaten to become more interesting. With multiple factors of authentication and the possibility of adding new factors to an existing logon, the TGT will become a more interesting and active object once these features are implemented.

The simplest next step occurs when a user who has logged in and has a TGT decides to access an application that redirects the browser to CAS to obtain a Service Ticket. A Service Ticket object is created. It points to the TGT of the logged in user, and it contains the Service URL of the application. It exists for a few milliseconds before the application connects back to CAS to validate the ST ID string.

At validation, the Business Logic looks up the ST ID string in the Ticket Registry and gets the Service Ticket object. It points to the TGT object, and from that the validation code can obtain the userid (from the Authentication and Principal objects) and the Attributes (if this is a SAML validation). Then the ST is deleted.

A more complicated situation occurs when the application is a Proxy service, like a Portal. Then the CAS Business Logic trades a Service Ticket object in and generates a new TGT object in return (the Proxy TGT is called a Proxy Granting Ticket or PGT to distinguish it). A PGT is a form of TGT except that it points to the real TGT that contains the userid and attributes, and in the PGT if you follow the chain of Authentication and Principal objects you will end up with the Service URL in the place where a TGT has the userid.

The PGT can be used to obtain Service Tickets. When this happens, the ST points to the PGT which in turn points to the TGT that "contains" the userid.

So when you are thinking about Ticket Registries, or when you are designing JUnit test cases, there are four things to think about:

  1. a TGT
  2. a ST pointing to a TGT
  3. a PGT pointing to a TGT
  4. a ST pointing to a PGT pointing to a TGT

In various node failure scenarios, at one of the "pointing to" breaks you can jump from the current node's TicketRegistry to a backup shadow TicketRegistry copy of tickets belonging to a failed node. For example, the ST could point to the PGT and TGT in the failed node's registry, or the ST could point to a local PGT that then points to a TGT in the failed node's registry. Create the possibilities and verify that they work, but also remember that these have to work in order for the design to handle failures properly.

How it works

Cushy is simple enough it can be explained to anyone, but if you are in a rush you can stop here.

Back in the 1960's a "checkpoint" was a copy of the important information from a program written on disk so if the computer crashed the program could start back at almost the point it left off. If a CAS server saves its tickets to a checkpoint disk file, reboots, and then restores the tickets from the file back into memory it is back to the same state it had before rebooting. If you transfer the file to another computer and bring CAS up on that machine, you have moved the CAS server from one machine to another. Java's writeObject and readObject guarantee the state and data are completely saved and restored.

If you have no cluster and just a single CAS server, then replacing the DefaultTicketRegistry class with a CushyTicketRegistry class creates a CAS Server that you can shut down and restart without losing any previous logons.

JPA and the cache technologies try to maintain the image of a single big common bucket of shared tickets.This was necessary when the network Front End device simply accepted HTTP requests and assigned them to CAS servers is a round robin manner. Today network Front End devices are programmable and they can make decisions based on specific CAS logic. This allows each CAS server to own its own private slice of the problem.

When a new user is redirected to CAS, then the Front End can randomly choose a server. However, after the user logs in and is assigned a Cookie, the Front End should always route subsequent requests to the server that issued the cookie. That means that Service Tickets and Proxy Tickets are issued by the CAS server you logged into. The Front End can also be programmed to recognize validation requests (/validate, /serviceValidate, etc.) and route those requests to the server that issued the ticket identified by the ticket= parameter. Configuration for the BIG-IP F5 will be provided. If you do not have a smart Front End device, then use a different Ticket Registry technology.

With an intelligent Front End, there is no need for a Ticket Registry that simulates a big shared pool of tickets. Each node has its own registry with its own logged in users and the tickets they create. No other node needs to access these tickets, unless the node that owns them fails. Then any other node, or all the other nodes, handle requests until the failed node is restarted.

You could configure a Cushy cluster to only make full checkpoints files containing all the tickets. The cost of a checkpoint is small, but it is large enough that you might be reluctant to schedule them frequently enough to provide the best protection. So between full checkpoints, Cushy creates and transmits a sequence of "incremental" change files that each have all the changes since the last full checkpoint. In the Spring XML configuration file you set the time between incrementals and the time between checkpoints. The choice is up to you, but a reasonable suggestion is to exchange incrementals every 5-15 seconds and checkpoints every 3-15 minutes.

Each incremental has a small number of new Login (TGT) tickets and maybe a few unclaimed service tickets. However, because we do not know whether any previous incremental was or was not processed, it is necessary to transmit the list of every ticket that was deleted since the last full checkpoint, and that will contain the ID of lots of Service Tickets that were created, validated, and deleted within a few milliseconds. That list is going to grow, and its size is limited by the fact that we can start over again after each full checkpoint.

Note: Replicating Service Tickets between nodes is almost never useful. The "p:excludeSTFromFiles" parameter in the Spring configuration XML causes Cushy to ignore Service Tickets when writing files, which keeps the deleted tickets list small and limits the growth of incrementals, if you prefer a very long time between full checkpoints.

Ticket Names

As with everything else, CAS has a Spring bean configuration file (uniqueIdGenerators.xml) to configure how ticket ids are generated. The default class generates tickets in the following format:

type - num - random - nodename

where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end of the ticket is identified as a nodename.

In vanilla CAS the nodename typically comes from the cas.properties file and defaults to "CAS". Cushy requires each node in the cluster to have a unique node name. The configuration of the CushyClusterConfiguration bean makes this somewhat easy (as described below) and it also generates the clusterConfiguration.getTicketSuffix property that can be used to plug a real node name into the uniqueIdGenerators.xml file:

<bean id="ticketGrantingTicketUniqueIdGenerator" class="org.jasig.cas.util.DefaultUniqueTicketIdGenerator">
<constructor-arg index="0" type="int" value="50" />
<constructor-arg  index="1"  value="#{clusterConfiguration.getTicketSuffix()}" />
</bean>

How it Fails (Nicely)

The Primary + Warm Spare Cluster

One common cluster model is to have a single master CAS server that normally handles all the requests, and a normally idle backup server (a "warm spare") that does nothing until the master goes down. Then the backup server handles requests while the master is down.

During normal processing the master server is generating tickets, creating checkpoints and increments, and sending them to the backup server. The backup server is generating empty checkpoints with no tickets because it has not yet received a request.

Then the master is shut down or crashes. The backup server has a copy in memory of all the tickets generated by the master, except for the last few seconds before the crash. When new users log in, it creates new Login Tickets in its own Ticket Registry. When it gets a request for a new Service Ticket for a user who logged into the master, it creates the ST in its own registry (with its own nodename suffix) but connects the ST to the Login Ticket in its copy of the master's Ticket Registry.

Remember the CAS Business Logic is used to a Ticket Registry maintaining what appears to be a large collection of tickets shared by all the nodes. So the Business Logic is quite happy with a Service Ticket created by one node pointing to a Login Ticket created by another node.

Now the master comes back up and, for this example, let us assume that it resumes its role as master (there are configurations where the backup becomes the new master and so when the old master comes back it becomes the new backup. Cushy works either way).

What happens next depends on how smart the Front End is. If it has been programmed to route requests based on the suffix of the tickets in the login cookie, then users who logged into the backup server during the failure continue to use the backup server, while new users all go back to the master. If the Front End is programmed to route all requests to the master as long as the master is up, then it appears that when the master came up the backup server "failed over to the master".

When the master comes up it reloads its old copy of its ticket registry from before the crash, and it gets a copy of the tickets generated by the backup server while it was down. When it subsequently gets requests from users who logged into the backup server, it resolves those requests using its copy of that TGT.

This leaves a few residual "issues" that are not really big problems and are deferred until Cushy 2.0. Because each server is the owner of its own tickets, and its Ticket Registry is the authoritative source of status on its own tickets, other nodes cannot make permanent changes to another node's tickets during a failover.

This means that the master is unaware of things the backup server did while it was down that should have modified its tickets. For example, if a user logs out of CAS while the backup server is in control, then the Cookie gets deleted and all the normal CAS logoff processing is done, but the Login Ticket (the TGT) cannot really be deleted. That ticket belongs to the master, and when the master comes back up again it will be in the restored Registry. However, it turns out that CAS doesn't really have to delete the ticket. Since the cookie has been deleted, nobody is going to try and use it. It will simply sit around until it times out and is deleted later on.

A more serious problem occurs for Single Sign Out of people who logged into the backup server while the master is down in systems where the Front End processor is not programmed to route requests intelligently. When the master reboots and starts handling all new requests, they have a TGT is that is "frozen" to the state it was in when the master rebooted. The master can subsequently create new Service Tickets from that TGT, but Single Sign Out will not know to log them off from those services when the user logs off. The current solution is to use Front End programming. Cushy 2.0 may add intelligent TGT migration and merging after a CAS server reboots.

A Smart Front End Cluster

A programmable Front End is adequate to Cushy needs if it can route requests based on four rules:

  1. If the URL "path" is a validate request (/cas/validate, /cas/serviceValidate, etc.) then route to the node indicated by the suffix on the value of the ticket= parameter.
  2. If the URL is a /proxy request, route to the node indicated by the suffix of the pgt= parameter.
  3. If the request has a CASTGC cookie, then route to the node indicated by the suffix of the TGT that is the cookie's value.
  4. Otherwise, or if the node selected by 1-3 is down, choose any CAS node

So normally all requests go to the machine that created and therefore owns the ticket, no matter what type of ticket it is. When a CAS server fails, requests for its tickets are assigned to one of the other servers.

When a CAS server receives a request for a ticket owned by another node, it fully activates the other nodes shadow Ticket Registry. It then looks up the ticket in that registry and returns it to the CAS Business Logic. A node may not have a copy of tickets issued in the last few seconds, so one or two users may see an error.

Cushy can issue a Service Ticket that points to a Login Ticket owned by the failed node. More interestingly, it can issue a Proxy Granting Ticket pointing to the Login Ticket on the failed node. In both cases the new ticket has the suffix and is owned by the node that created it and not by the node that owns the login.

Again, the rule that each node owns its own registry and all the tickets it created and the other nodes can't successfully change those tickets has certain consequences.

  • If you use Single Sign Off, then the Login Ticket maintains a table of Services to which you have logged in so that when you logout or when your Login Ticket times out in the middle of the night then each Service gets a call from CAS on a published URL with the Service Ticket ID you used to login so the application can log you off if it has not already done so. In fail-over mode a backup server can issue Service Tickets for a failed nodes TGT, but it cannot successfully update the Service table in the TGT, because when the failed node comes back up it will restore the old Service table along with the old TGT.
  • If the user logs out and the Services are notified by the backup CAS server, and then the node that owned the TGT is restored along with the now undead copy of the obsolete TGT, then in the middle of the night that restored TGT will timeout and the Services will all be notified of the logoff a second time. It seems unlikely that anyone would ever write a service logout so badly that a second logoff would be a problem. Mostly it will be ignored.

You have probably guessed by now that Yale does not use Single Sign Out, and if we ever enabled it we would only indicate that it is supported on a "best effort" basis in the event of a CAS node crash.

CAS Cluster

In this document a CAS "cluster" is just a bunch of CAS server instances that are configured to know about each other. The term "cluster" does not imply that the Web servers are clustered in the sense that they share Session information. Nor does it depend on any other type of communication between machines. In fact, a CAS cluster could be created from a CAS running under Tomcat on Windows and one running under JBoss on Linux.

To the outside world, the cluster typically shares a common virtual URL simulated by the Front End device. At Yale, CAS is "https://secure.its.yale.edu/cas" to all the users and applications. The "secure.its.yale.edu" DNS name is associated with an IP address managed by the BIG-IP F5 device. It terminates the SSL, then examines requests and based on programming called iRules it forwards requests to any of the configured CAS virtual machines.

Each virtual machine has a native DNS name and URL. It is these "native" URLs that define the cluster because each CAS VM has to use the native URL to talk to another CAS VM. At Yale those URLs follow a pattern of "https://vm-foodevapp-01.web.yale.internal:8443/cas". 

Internally, Cushy configuration takes a list of URLs and generates a cluster definition with three pieces of data for each cluster member: a nodename like "vmfoodevapp01" (the first element of the DNS name with dashes removed), the URL, and the ticket suffix that identifies that node (at Yale the F5 likes the ticket suffix to be an MD5 hash of the DNS name).

 

Sticky Browser Sessions

An F5 can be configured to have "sticky" connections between a client and a server. The first time the browser connects to a service name it is assigned any available back-end server. For the next few minutes, however, subsequently requests from that client to the same service are forwarded to whichever server the F5 assigned to handle the first request.

While a user is logging in to CAS with the form that takes userid and password, or any other credentials, there is no Ticket. No cookie, no ticket=, none of the features that would trigger the first three rules of the programmable intelligent Front End. CAS was designed (for better or worse) to use Spring Webflow which keeps information in the Session object during the login process. For Web Flow to work, one of two things must happen:

  1. The browser has to POST the Userid/Password form back to the CAS server that sent it the form (which means the front end has to use sticky sessions based on IP address or JSESSIONID value).
  2. You have to use real Web Server clustering so the Web Servers all exchange Session objects based on JSESSIONID.

Option 2 is a fairly complex process of container configuration, unless you have already solved this problem and routinely generate JBoss cluster VMs using some canned script. Sticky sessions in the front end are somewhat easier to configure, but any sticky session rule MUST apply only after the first three rules (ticket= suffix, pgt= suffix, or CASTGC suffix) have been tested and found not to apply.

There is another solution, but it involves a CAS modification. Yale made a minor change to the CAS Web Flow to store data that Web Flow saves in the Session object also in hidden fields of the login form (because it is not secure information). Then there is a check at the beginning of the Web Flow for a POST arriving at the beginning of the flow, allowing it to jump forward to the step of the Flow that handles the Form submission.

What is a Ticket Registry

This is a rather detailed description of one CAS component, but it does not assume any prior knowledge.

Layers

Web applications are traditionally defined in three layers. The User Interface generates the Web pages, displays data, and processes user input. The Business Logic validates requests, verifies inventory, approves the credit card, and so on. The back-end "Persistence" layer talks to a database. CAS doesn't sell anything, but it has roughly the same three layers.

In CAS the "User Interface" layer has two jobs. The part that talks to real users handles login requests through the Spring Web Flow services. However, CAS also accepts Web requests from the applications that are trying to validate a Service Ticket and get information about the user. This is also part of the UI layer, and it is handled by the Spring MVC framework.

Cushy extends this second part of the UI so that node to node communication within the cluster also flows through MVC.

The Business Logic layer of CAS verifies the userid and password or any other credentials, and it creates and deletes the TGT and ST objects.It also validates Service Tickets and deletes them after use.

The Persistence layer implements the TicketRegistry interface. In the simplest case of a single CAS server using the DefaultTicketRegistry, the tickets are stored in an in memory table and there is no back end database or network I/O. JPA stores the tickets in a database. The "cache" solutions trigger network I/O.

Cushy stores the tickets in memory, just like the DefaultTicketRegistry. Periodically it backs the table up to a file on disk, but that is not part of the CAS request processing flow, so the checkpoint files and HTTP file transfer are not part of the application layers.

Spring Configuration

Many applications have their own custom configuration file. Spring is a Java framework that provide a much more powerful configuration environment that is, necessarily, somewhat more complicated. Consider the TicketRegistry layer and interface.

The CAS Business Logic configuration requires that some Java class be loaded into memory, that an object of that class be created and then configured with parameters, and that its name be "ticketRegistry". The object also has to implement the TicketRegistry Java interface. CAS provides a number of classes that can do the job, including DefaultTicketRegistry that holds the tickets in memory and has no important configuration parameters.

In the CAS WAR file (the web application file deployed to web servers) there is a ticketRegistry.xml file where, by CAS convention, the class that implements the TicketRegistry interface should be configured.

The Serialization Problem of Current CAS

JPA is the current technique for creating Java objects from a database query, updating objects, and committing changes back to the database. To support JPA, the TGT and ST Java objects have "annotations" to define the names of tables and columns that correspond to each object and field. If you don't use JPA, these annotations are ignored. If you use JPA, then it automatically generates additional Java code that is added to every ticket object to track when it is used and updated. JPA is the only TicketRegistry solution that doesn't use serialization (other than DefaultTicketRegistry that does nothing).

The "cache"  (Ehcache, JBoss Cache, Memcached)  JASIG TicketRegistry modules have no annotations and few expectations. They use ordinary objects (sometimes call Plain Old Java Objects or POJOs). They require the objects to be serializable because, like Cushy, they use the Java writeObject statement to turn any object to a stream of bytes that can be held in memory, stored on disk, or sent over the network.

Java Serialization turns an object into a bunch of bytes. Since Java can handle all the ordinary types of data, it can automatically serialize any simple Java class. Ticket objects are declared to be serializable, but there is a problem. The problem has always existed though it has not been well documented.

Serialization isn't Thread Safe unless You Make It

A Web server handles lots of different HTTP requests from clients at the same time. It assigns a thread to each request. The threads run concurrently, and on modern multicore processors they can run simultaneously.

If an object has a collection (a table or list of objects) that can be updated by these requests, then it has to take some step to make sure that no two requests try to update the collection at the same time. The TGT has a collection of Services to which the user has authenticated (for Single Sign Out) and in CAS 4 it also has a List of Supplemental Authentications. CAS 3 was sloppy about this, but CAS 4 adds "synchronized" methods to protect against concurrent access to these tables by different Web request threads.

Unfortunately, serialization accesses the object and its internal collections without going through any of the synchronized methods. It has to iterate through all the members of the table or the list, and in general it cannot do this in a thread safe manner. Because serialization occurs when some external component (Ehcache, JBoss Cache, ...) decides to do it, and that decision is made deep inside what amounts to a giant black box of code, there is no way to externally guarantee that something won't go wrong.

One solution (that CAS has not implemented yet) is to create a custom serialization method of the Ticket objects that is synchronized between threads. The code is standard and simple:

  private synchronized void writeObject(ObjectOutputStream s) throws IOException {
     s.defaultWriteObject();
}

This "solution" is not without controversy. It should work correctly for CAS using any of the TicketRegistry alternatives, but it cannot be guaranteed to work when you decide to use a large "black box" of complex logic.The problem it creates is a threat of Deadlock.

Deadlock occurs when I own object A and need to acquire ownership of object B, while you own object B and request ownership of object A. Neither of us can get what we want, and neither of us will give up the thing the other wants. Any synchronized mechanism is exposed to deadlock unless you can enforce rules on your code to make sure it never happens.

The simplest solution is to prohibit any code from obtaining exclusive ownership of more than one object at a time. If that doesn't work, then the objects have to be obtained in a specific order by universal agreement.

CAS only acquires ownership of one object at a time. Serialization would only acquire objects one at at time. Cushy only acquires ownership of one object at a time. However, who knows what Ehcache, JBoss Cache, Memcached, or other systems do? It is regarded as very bad practice to do disk or network I/O or to use complex services like serialization while holding exclusive ownership of an object. These systems are probably safe, but I lack the resources to prove they are safe.

Deserialized Objects get a Private Copy of the TGT

However, current (CAS 3 and CAS 4) code creates a different problem of its own, and this is an issue no matter what TicketRegistry you use. The TGT is not an entirely static collection of objects. In CAS 3 there is a table of ST IDs and Service URLS used by Single Log Off and new entries are added to the table every time a Service Ticket is created. In CAS 4 there is an array of supplimentalAuthentications.

When you serialize a ST or PGT individually then the stream of bytes generated by writeObject includes all the objects that it points to, include the TGT and all it's stuff. When this gets deserialized at the other end, a copy of all these objects is created. So you cannot really serialize a ST or PGT by itself.

If you serialize the entire registry of tickets, as Cushy does during a full checkpoint, then when you deserialize it you get an exact copy with all the same connections and structure. However, if you serialize an individual ticket, as Cushy does during an incremental and as all the "cache" based object replication systems do for everything, then each ST or PGT gets its own private copy of the original TGT frozen at the time it was serialized.

This is absolutely not a problem now, because CAS 3 and CAS 4.0 TGTs don't meaningfully change after they are created. It is not plausibly a problem for Service Tickets because they don't live long. However, when you start to exploit multifactor authentication and use the supplimentalAuthentications table then changes you make to the TGT after you create a PGT will have different behavior on different nodes. On the node that created both the TGT and PGT then changes to the TGT become visible to the Proxy and to services it tries to access. On any other node, the PGT has its own private copy of the TGT frozen when the PGT was created and changes to the real TGT are not visible.

Cushy automatically solves this problem every time it takes a full checkpoint. The other nodes obtain a fresh exact copy of all the tickets on the other node connected together exactly as they are on the other node with the very latest information.

For Now

Current CAS simply ignores these issues and it doesn't seem to have any problems doing so. Every so often you may get an exception in the log during serialization caused by threading problems.

Otherwise, you have to change the Ticket classes in cas-server-core.

Yale does not use Single Sign Out, so we do not need the "Services" table in the TGT. We disable updates to the table and without the table the CAS 3 TGT is thread safe enough to be reliable.

If we used Single Sign Out and Cushy, then we would modify the Ticket objects to add the synchronized writeObject. You can do this with Cushy because you can verify from the code that a deadlock is impossible. You could cross your fingers with the other Registry solutions.

Usage Pattern

Users start logging into CAS at the start of the business day. The number of TGTs begins to grow.

Users seldom log out of CAS, so TGTs typically time out instead of being explicitly deleted.

Users abandon a TGT when they close the browser. They then get a new TGT and cookie when they open a new browser window.

Therefore, the number of TGTs can be much larger than the number of real CAS users. It is a count of browser windows and not of people or machines.

At Yale around 3 PM a typical set of statistics is:

Unexpired-TGTs: 13821
Unexpired-STs: 12
Expired TGTs: 30
Expired STs: 11

So you see that a Ticket Registry is overwhelmingly a place to keep logon TGTs (in this statistic TGTs and PGTs are combined).

Over night the TGTs from earlier in the day time out and the Registry Cleaner deletes them.

So generally the pattern is a slow growth of TGTs while people are using the network application, followed by a slow reduction of tickets while they are asleep, with a minimum probably reached each morning before 8 AM.

If you display CAS statistics periodically during the day you will see a regular pattern and a typical maximum number of tickets in use "late in the day".

Translated to Cushy, the cost of the full checkpoint and the size of the checkpoint file grow over time along with the number of active tickets, and then the file shrinks over night. During any period of intense login activity the incremental file may be unusually large. If you had a long time between checkpoints, then around the daily minimum (8 AM) you could get an incremental file bigger than the checkpoint.

Configuration

In CAS the TicketRegisty is configured using the WEB-INF/spring-configuration/ticketRegistry.xml file.

In the standard file, a bean with id="ticketRegistry" is configured selecting the class name of one of the optional TicketRegistry implementations (JBoss Cache, Ehcache, ...). To use Cushy you configure the CushyTicketRegistry class and its particular parameters.

Then at the end there are a group of bean definitions that set up periodic timer driven operations using the Spring support for the Quartz timer library. Normally these beans set up the RegistryCleaner to wake up periodically and remove all the expired tickets from the Registry.

Cushy adds a new bean at the beginning. This is an optional bean for class CushyClusterConfiguration that uses some static configuration information and runtime Java logic to find the IP addresses and hostname of the current computer to select a specific cluster configuration and generate property values that can be passed on to the CushyTicketRegistry bean. If this class does not do what you want, you can alter it, replace it, or just generate static configuration for the CushyTicketRegistry bean.

Then add a second timer driven operation to the end of the file to call the "timerDriven" method of the CushyTicketRegistry object on a regular basis (say once every 10 seconds) to trigger writing the checkpoint and incremental files.

The Cluster

We prefer a single "cas.war" artifact that works everywhere. It has to work on standalone or clustered environments, in a desktop sandbox with or without virtual machines, but also in official DEV (development), TEST, and PROD (production) servers.

There are techniques (Ant, Maven) to "filter" a WAR file replacing one string of text with another as it is deployed to a particular host. While that works for individual parameters like "nodeName", the techniques that are available make it hard to substitute a variable number of elements, and some locations have one CAS node in development, two CAS nodes in test, and three CAS nodes in production.

Then when we went to Production Services to actually deploy the code, they said that they did not want to edit configuration files. They wanted a system where the same WAR is deployed anywhere and when it starts up it looks at the machine it is on, decides that this a TEST machine (because it has "tst" in the hostname), and so it automatically generates the configuration of the TEST cluster.

At this point you should have figured out that it would be magical if anyone could write a class that reads your mind and figures out what type of cluster you want. However, it did seem reasonable to write a class that could handle most configurations out of the box and was small enough and simple enough that you could add any custom logic yourself.

The class is CushyClusterConfiguration and it is separate from CushyTicketRegistry to isolate its entirely optional convenience features and make it possible to jiggle the configuration logic without touching the actual TicketRegistry. It has two configuration strategies:

First, you can configure a sequence of clusters (desktop sandbox, and machine room development, test, and production) by providing for each cluster a list of the machine specific raw URL to get to CAS (from other machines also behind the machine room firewall). CusyClusterConfiguration look up all the IP addresses of the current machine, then looks up the addresses associated with the servers in each URL in each cluster. It chooses the first cluster that it is in (that contains a URL that resolves to an address of the current machine).

Second, if none of the configured clusters contains the current machine, or if no configuration is provided, then Cushy uses the HOSTNAME and some Java code to automatically configure the cluster. At this point we expect you to provide some programming, unless you can use the Yale solution off the shelf.

At Yale we know that CAS is a relatively small application with limited requirements, and that any modern multi-core server can certainly handle all the CAS activity of the university (or even of a much larger university). So we always create clusters with only two nodes, and the other node is just for recovery from a serious failure (and ideally the other node is in another machine room far enough away to be outside the blast radius).

In any given cluster, the hostname of both machines is identical except for a suffix that is either the three characters "-01" or "-02". So by finding the current HOSTNAME it can say that if this machine has "-01" in its name, the other machine in the cluster is "-02", or the reverse.

Configuration By File

You can define the CushyClusterConfiguration bean with or without a "clusterDefinition" property. If you provide the property, it is a List of Lists of Strings:

    <bean id="clusterConfiguration" class="edu.yale.its.tp.cas.util.CushyClusterConfiguration"
        p:md5Suffix="yes" >
      <property name="clusterDefinition">
           <list>
               <!-- Desktop Sandbox cluster -->
               <list>
                   <value>http://foo.yu.yale.edu:8080/cas/</value>
                   <value>http://bar.yu.yale.edu:8080/cas/</value>
               </list>
               <!-- Development cluster -->
               <list>
                   <value>https://casdev1.yale.edu:8443/cas/</value>
                   <value>https://casdev2.yale.edu:8443/cas/</value>
               </list>
...
           </list>
      </property>
    </bean>

In spring, the <value> tag generates a String, so this is what Java calls a List<List<String>> (List of Lists of Strings). As noted, the top List has two elements. The first element is a List with two Strings for the machines foo and bar. The second element is another List with two strings for casdev1 and casdev2.

There is no good way to determine all the DNS names that may resolve to an address on this server. However, it is relatively easy in Java to find all the IP addresses of all the LAN interfaces on the current machine. This list may be longer than you think. Each LAN adapter can have IPv4 and IPv6 addresses, and then there can be multiple real LANs and a bunch of virtual LAN adapters for VMWare or Virtualbox VMs you host or tunnels to VPN connections. Of course, there is always the loopback address.

So CushyClusterConfiguration goes to the first cluster (foo and bar). It does a name lookup (in DNS and in the local etc/hosts file) for each server name (foo.yu.yale.edu and bar.yu.yale.edu). Each lookup returns a list of IP addresses associated with that name.

CushyClusterConfiguration selects the first cluster and first host computer whose name resolves to an IP address that is also an address on one of the interfaces of the current computer. The DNS lookup of foo.yu.yale.edu returns a bunch of IP addresses. If any of those addresses is also an address assigned to any real or virtual LAN on the current machine, then that is the cluster host name and that is the cluster to use. If not, then try again in the next cluster.

CushyClusterConfiguration can determine if it is running in the sandbox on the desktop, or if it is running the development, test, production, disaster recovery, or any other cluster definition. The only requirement is that IP addresses be distinct across servers and cluster.

Restrictions (if you use a single WAR file with a single global configuration):

It is not generally possible to determine the port numbers that a J2EE Web Server is using. So it is not possible to make distinctions based only on port number. CushyClusterConfiguration requires a difference in IP addresses. So if you want to emulate a cluster on a single machine, use VirtualBox to create VMs and don't think you can run two Tomcats on different ports.

(This does not apply to Unit Testing, because Unit Testing does not use a regular WAR and is not constrained to a single configuration file. If you look at the unit tests you can see examples where there are two instances of CushyTicketRegistry configured with two instances of CushyClusterConfiguration with two cluster configuration files. In fact, it can be a useful trick that the code stops at the first match. If you edit the etc/hosts file to create a bunch of dummy hostnames all mapped on this computer to the loopback address (127.0.0.1), then those names will always match the current computer and Cushy will stop when it encounters the first such name. The trick then is to create for the two test instances of Cushy two configuration files (localhost1,localhost2 and localhost2,localhost1). Fed the first configuration, that test instance of Cushy will match the first name (localhost1) and will expect the cluster to also have the other name (localhost2). Fed the second configuration the other test class will stop at localhost2 (which is first in that file) and then assume the cluster also contains localhost1.)

Any automatic configuration mechanism can get screwed up by mistakes made by system administrators. In this case, it is a little easier to mess things up in Windows. You may have already noticed this if your Windows machine hosts VMs or if your home computer is a member of your Active Directory at work (though VPNs for example). At least you would see it if you do "nslookup" to see what DNS thinks of your machine. Windows has Dynamic DNS support and it is enabled by default on each new LAN adapter. After a virtual LAN adapter has been configured you can go to its adapter configuration, select IPv4, click Advanced, select the DNS tab, and turn off the checkbox labelled "Register this connection's addresses in DNS". If you don't do this (and how many people even think to do this), then the private IP address assigned to your computer on the virtual LAN (or the home network address assigned to your computer when it has a VPN tunnel to work) gets registered to the AD DNS server. When you look up your machine in DNS you get the IP address you expected, and then an additional address of the form 192.168.1.? which is either the address of your machine on your home LAN or its address on the private virtual LAN that connects it to VMs it hosts.

Generally the extra address doesn't matter. A problem only arises when another computer that is also on a home or virtual network with its own 192.168.1.* addresses looks up the DNS name of a computer, gets back a list of addresses, and for whatever reason decides that that other computer is also on its home or virtual LAN instead of using the real public address that can actually get to the machine.

CushyClusterConfiguration is going to notice all the addresses on the machine and all the addresses registered to DNS, and it may misidentify the cluster if these spurious internal private addresses are being used on more than one sandbox or machine room CAS computer. It is a design objective of continuing Cushy development to refine this configuration process so you cannot get messed up when a USB device you plug into your computer generates a USB LAN with a 192.168.153.4 address for your computer, but to do this in a way that preserves your ability to configure a couple of VM guests on your desktop for CAS testing.

Note also that the Unit Test cases sometimes exploit this by defining dummy hostnames that resolve to the loopback address and therefore are immediately matched on any computer.

In practice you will have a sandbox you created and some machine room VMs that were professionally configured and do not have strange or unexpected IP addresses, and you can configure all the hostnames in a configuration file and Cushy will select the right cluster and configure itself the way you expect.

Autoconfigure

At Yale the names of DEV, TEST, and PROD machines follow a predictable pattern, and CAS clusters have only two machines. So production services asked that CAS automatically configure itself based on those conventions. If you have similar conventions and any Java coding expertise you can modify the autoconfiguration logic at the end of CushyClusterConfiguration Java source.

CAS is a relatively simple program with low resource utilization that can run on very large servers. There is no need to spread the load across multiple servers, so the only reason for clustering is error recovery. At Yale a single additional machine is regarded as providing enough recovery.

At Yale, the two servers in any cluster have DNS names that ends in "-01" or "-02". Therefore, Cushy autoconfigure gets the HOSTNAME of the current machine, looks for a "-01" or "-02" in the name, and when it matches creates a cluster with the current machine and one additional machine with the same name but substituting "-01" for "-02" or the reverse.

Standalone

If no configured cluster matches the current machine IP addresses and the machine does not autoconfigure (because the HOSTNAME does not have "-01" or "-02"), then Cushy configures a single standalone server with no cluster.

Even without a cluster, Cushy still checkpoints the ticket cache to disk and restores the tickets across a reboot. So it provides a useful function in a single machine configuration that is otherwise only available with JPA and a database.

You Can Configure Manually

Although CushyClusterConfiguration makes most configuration problems simple and automatic, if it does the wrong thing and you don't want to change the code you can ignore it entirely. As will be shown in the next section, there are three properties, a string and two Properties tables) that are input to the CusyTicketRegistry bean. The whole purpose of CushyClusterConfiguration is to generate a value for these three parameters. If you don't like it, you can use Spring to generate static values for these parameters and you don't even have to use the clusterConfiguration bean.

Other Parameters

Typically in the ticketRegistry.xml Spring configuration file you configure CushyClusterConfiguration as a bean with id="clusterConfiguration" first, and then configure the usual id="ticketRegistry" using CusyTicketRegistry. The clusterConfiguration bean exports some properties that are used (through Spring EL) to configure the Registry bean.

  <bean id="ticketRegistry" class="edu.yale.cas.ticket.registry.CushyTicketRegistry"
          p:serviceTicketIdGenerator-ref="serviceTicketUniqueIdGenerator"
          p:checkpointInterval="300"
          p:cacheDirectory=  "#{systemProperties['jboss.server.data.dir']}/cas"
          p:nodeName=        "#{clusterConfiguration.getNodeName()}"
          p:nodeNameToUrl=   "#{clusterConfiguration.getNodeNameToUrl()}"
          p:suffixToNodeName="#{clusterConfiguration.getSuffixToNodeName()}"  />

 The nodeName, nodeNameToUrl, and suffixToNodeName parameters link back to properties generated as a result of the logic in the CushyClusterConfiguration bean.

The cacheDirectory is a work directory on disk to which it has read/write privileges. The default is "/var/cache/cas" which is Unix syntax but can be created as a directory structure on Windows. In this example we use the Java system property for the JBoss /data subdirectory when running CAS on JBoss.

The checkpointInterval is the time in seconds between successive full checkpoints. Between checkpoints, incremental files will be generated.

CushyClusterConfiguration exposes a md5Suffix="yes" parameter which causes it to generate a ticketSuffix that is the MD5 hash of the computer host instead of using the nodename as a suffix. The F5 likes to refer to computers by their MD5 hash and using that as the ticket suffix simplifies the F5 configuration even though it makes the ticket longer.

There are other "properties" that actually turn code options on or off. Internally they are static variable that only appear to be properties of the CushyTicketRegistry class so they can be added to the ticketRegistry.xml file. The alternative would be to make them static values in the source and require you to recompile the source to make a change.

  • p:sharedDisk="true" - disables HTTP communication for JUnit Tests and when the work directory is on a shared disk.
  • p:disableJITDeserialization="true" - disables an optimization that only reads tickets from a checkpoint or incremental file the first time the tickets are actually needed. The only reason for using this parameter is during testing so that the number of tickets read from the file appears in the log immediately after the file is generated.
  • p:excludeSTFromFiles="true" - this is plausibly an option you should use. It prevents Service Tickets from being written to the checkpoint or incremental files. This makes incremental files smaller because it is then not necessary to keep the growing list of ST IDs for all the Service Tickets that were deleted probably before anyone ever really cared about them.
  • p:useThread="true" - use a thread to read the checkpoint file from another CAS node. If not set, the file is read in line and this may slow down the processing of a new checkpoint across all the nodes.

How Often?

"Quartz" is the standard Java library for timer driven events. There are various ways to use Quartz, including annotations in modern containers, but JASIG CAS uses a Spring Bean interface to Quartz where parameters are specified in XML. All the standard JASIG TicketRegistry configurations have contained a Spring Bean configuration that drives the RegistryCleaner to run and delete expired tickets every so often. CushyTicketRegistry requires a second Quartz timer configured in the same file to call a method that replicates tickets. The interval configured in the Quartz part of the XML sets a base timer that determines the frequency of the incremental updates (typically every 5-15 seconds). A second parameter to the CushyTicketRegistry class sets a much longer period between full checkpoints of all the tickets in the registry (typically every 5-10 minutes).

A full checkpoint contains all the tickets. If the cache contains 20,000 tickets, it takes about a second to checkpoint, generates a 3.2 megabyte file, and then has to be copied across the network to the other nodes. An incremental file contains only the tickets that were added or deleted since the last full checkpoint. It typically takes a tenth of a second an uses very little disk space or network. However, after a number of incrementals it is a good idea to do a fresh checkpoint just to clean things up. You set the parameters to optimize your CAS environment, although either operation has so little overhead that it should not be a big deal.

Based on the usage pattern, at 8:00 AM the ticket registry is mostly empty and full checkpoints take no time. Late in the afternoon the registry reaches its maximum size and the difference between incrementals and full checkpoints is at its greatest.

Although CAS uses the term "incremental", the actual algorithm is a differential between the current cache and the last full checkpoint. So between full checkpoints, the incremental file size increases as it accumulates all the changes. Since this also includes a list of all the Service Ticket IDs that were deleted (just to be absolutely sure things are correct), if you made the period between full checkpoints unusually long it is possible for the incremental file to become larger than the checkpoint and since it is transferred so frequently this would be much, much worse to performance than setting the period for full checkpoints to be a reasonable number.

Nodes notify each other of a full checkpoint. Incrementals occur so frequently that it would be inefficient to send messages around. A node picks up the other incrementals from the other nodes each time it generates its own incremental.

CushyTicketRegistry (the code)

CushyTicketRegistry is a medium sized Java class that does all the work. It began with the standard JASIG DefaultTicketRegistry code that stores the tickets in memory (in a ConcurrentHashMap). Then on top of that base, it adds code to serialize tickets to disk and to transfer the disk files between nodes using HTTP.

Unlike the JASIG TicketRegistry implementations, CushyTicketRegistry does not create a single big cache of tickets lumped together from all the nodes. Each node "owns" the tickets it creates

The Spring XML configuration creates what is called the Primary instance of the CushyTicketRegistry class. This object is the TicketRegistry as far as the rest of CAS is concerned and it implements the TicketRegistry interface. From the properties provided by Spring from the CushyClusterConfiguration, the Primary object determines the other nodes in the cluster and it creates an additional Secondary object instance of the CushyTicketRegistry class for each other node.

Tickets created by CAS on this node are stored in the Primary object which periodically checkpoints to disk, and more frequently writes the incremental changes file to disk. It then notifies the other nodes when it has a new checkpoint to pick up. The Secondary objects keep a Read-Only copy of the tickets on the other nodes in memory in case that node fails.

 

Methods and Fields

In addition to the ConcurrentHashMap named "cache" that CushyTicketRegistry borrowed from the JASIG DefaultTicketRegistry code to index all the tickets by their ID string, CushyTicketRegistry adds two collections:

  • addedTickets - a reference to the tickets that were added to the registry since the last full ticket backup to disk.
  • deletedTickets - a collection of ticketids for the tickets that were deleted.

These two collections are maintained by the implementations of the addTicket and deleteTicket methods of the TicketRegistry interface.

This class has three constructors.

  • The constructor without arguments is used by Spring XML configuration of the class and generates the Primary object that holds the local tickets created by CAS on this node. There is limited initialization that can be done in the constructor, so most of the work is in the afterPropertiesSet() method called by Spring when it completes its XML configuration of the object.
  • The constructor with nodename and url parameters is used by the Primary object to create Secondary objects for other nodes in the cluster configuration.
  • The constructor with a bunch of arguments is used by Unit Tests.

The following significant methods are added to the CushyTicketRegistry class:

  • checkpoint() - Called from the periodic quartz thread. Serializes all tickets in the Registry to the nodename file in the work directory on disk. Makes a point in time thread safe copy of references to all the current tickets in "cache" and clearsthe added and deleted ticket collections. Builds an ArrayList of the non-expired tickets. Serializes the ArrayList (and therefore all the non-expired tickets) to /var/cache/cas/CASVM1. Generates a Service Ticket ID that will act as a password until the next checkpoint call. Notifies the other nodes, in this example by calling the /cas/cache/notify service of CASVM2 passing the password ticketid.
  • restore() - Empty the current cache and de-serialize the /var/cache/cas/nodename file to a list of tickets, then add all the unexpired tickets in the list to rebuild the cache. Typically this only happens once on the primary object at CAS startup where the previous checkpoint of the local cache is reloaded from disk to restore this node to the state it was in at last shutdown. However, secondary caches (of CASVM2 in this example) are loaded all the time in response to a /cas/cache/notify call from CASVM2 that it has taken a new checkpoint.
  • writeIncremental() - Called by the quartz thread between checkpoints. Serializes point in time thread safe copies of the addedTickets and deletedTickets collections to create the nodename-incremental file in the work directory.
  • readIncremental() - De-serialize two collections from the nodename-incremental file in the work directory. Apply one collection to add tickets to the current cache collection and then apply the second collection to delete tickets. After the update, the cache contains all the non-expired tickets from the other node at the point the incremental file was created.
  • readRemoteCache - Generate an https: request to read the nodename or nodename-incremental file from another node and store it in the work directory.
  • notifyNodes() - calls the /cas/cluster/notify restful service on each other node after a call to checkpoint() generates a full backup. Passes the generated dummy ServiceTicketId to the node which acts as a password in any subsequent getRemoteCache() call.
  • processNotify() - called from the Spring MVC layer when the message from a notifyNodes() call arrives at the other node.
  • timerDriven() - called from Quartz every so often (say every 10 seconds) to generate incrementals and periodically a full checkpoint. It also reads the current incrmental from all the other nodes.
  • destroy() - called by Java when CAS is shutting down. Writes a final checkpoint file that can be used after restart to reload all the tickets to their status at shutdown.

 

Unlike conventional JASIG Cache mechanisms, the CushyTicketRegistry does not combine tickets from all the nodes. It maintains shadow copies of the individual ticket caches from other nodes. If a node goes down, then the F5 starts routing requests for that node to the other nodes that are still up. The other nodes can recognize that these requests are "foreign" (for tickets issued by another node and therefore in the shadow copy of that node's tickets) and they can handle such requests temporarily until the other node is brought back up.

Flow

During normal CAS processing, the addTicket() and deleteTicket() methods lock the registry for just long enough to add an item to the end of the one of the two incremental collections. Cushy uses locks only for very simple updates and copies so it cannot deadlock and performance should not be affected. This is the only part of Cushy that runs under the normal CAS HTTP request processing.

Quartz maintains a pool of threads independent of the threads used by JBoss or Tomcat to handle HTTP requests. Periodically a timer event is triggered, Quartz assigns a thread from the pool to handle it, the thread calls the timerDriven() method of the primary CushyTicketRegistry object, and for the purpose of this example, let us assume that it is time for a new full checkpoint.

Java provides a complex built in class called ConcurrentHashMap that allows a collection of Tickets to be shared by request threads. The JASIG DefaultTicketRegistry uses this service, and Cushy adopts the same design. One method exposed by this built in class provides a new list of references to all the Ticket objects at some point in time. Cushy uses this service to obtain its own private list of all the Tickets that it can checkpoint without affecting any other thread doing normal CAS business.

The collection returned by ConcurrentHashMap is not serializable, so Cushy has to copy Tickets from it to a more standard collection, and it uses this opportunity to exclude expired tickets. Then it uses a single Java writeObject statement to write the List and a copy of all the Ticket objects to a checkpoint file on disk. Internally Java does all the hard work of figuring out what objects point to other objects so it can write only one copy of each unique object. When it returns, Cushy just has to close the file.

Between checkpoints the same logic applies, only instead of writing the complete set of Tickets, Cushy only serializes the addedTickets and the deletedTicket Ids to the disk file.

After writing a full checkpoint, Cushy generates a new dummyServiceTicket ID string and issues a Notify (calls the /cluster/notify URL of CAS on all the other nodes of the cluster) passing the dummyServiceTicket string so the other nodes can use it as a password to access the checkpoint and incremental files over the Web.

On the other nodes, the Notify request arrives through HTTP like any other CAS request (like a ST validate request). Spring routes the /cluster/notify suffix to the small Cushy CacheNotifyController Java class. We want all the other nodes to get a new copy of the new full checkpoint file as soon as possible there are two strategies to accomplish this.

Cushy does not expect a meaningful return from the /cluster/notify HTTP request. The purpose is just to trigger action on the other node, and the response is empty. Therefore, one simple strategy is to set an short Read Timeout on the HTTP request. The other node receives the Notify and begins to read the checkpoint file. Meanwhile, the node doing the Notify times out having not yet received a response, and so it goes on to Notify the next node in the cluster. Eventually when the checkpoint file has been fetched and restored to memory the Notify logic returns to the CacheNotifyController bean which then tries to generate an empty reply but discovers that the client node is no longer waiting for a reply. Things may end with a few sloppy exceptions, but the code expects and ignores them.

The other approach has the Notify request on the receiving node wake up a thread in the Secondary CusyTicketRegistry object coresponding to the node that sent the Notify. That thread can fetch the checkpoint file and restore the tickets to memory. Meanwhile, the CacheNotifyController returns immediately and sends the null response back to the notifying node. Nothing times out and no exceptions are generated, but now you have to use threading, which is a bit more heavy duty technology than Web applications prefer to use.

There is no notify for an incremental file. The nodes do not synchronize incrementals (too much overhead). So when the timerDriven() method is called between checkpoints, it writes an incremental file for the current node and then checks each Secondary object and attempts to read an incremental file from each other node in the cluster.

There is a chase condition between one node taking a full checkpoint when another node is trying to read an incremental. A new checkpoint deletes the previous incremental file. As each of the other nodes receives a Notify from this node they realize that there is a new checkpoint and no incremental, so a flag gets set and the next timer cycle through no incremental is read. However, after the checkpoint is generate and before the Notify is sent there is a opportunity for the other node to wake up, ask for the incremental file to be sent, and to get back an HTTP status of FILE_NOT_FOUND.

"Healthy" is a status of a Secondary object. Without it when a node goes down then the other nodes will try every timer tick (every 10 seconds or so) to connect to the dead node and fetch the latest incremental file. When a file request fails, then the node is marked "not healthy" and no more incrementals will be fetched until a Notify indicates that the node is back up.

Originally Cushy was designed to restore tickets to memory as soon as the file was loaded from the other node. However, this means that CAS is spending time deserializing data from files every few seconds, day after day while nothing goes wrong. It is necessary to get the files from the other nodes immediately because you cannot predict when a computer will crash, but the actual tickets don't need to be deserialized from the file until the node fails. So now Cushy uses Just In Time Deserialization. It holds the file on disk until the Business Logic asks for a ticket that belongs to one of the other nodes, something that should not occur unless the node owning the ticket has failed. Then Cushy deserializes the files from that node in order to find the requested ticket.

Security

The collection of tickets contains sensitive data. With access to the TGT ID values, a remote user could impersonate anyone currently logged in to CAS. So when checkpoint and incremental files are transferred between nodes of the cluster, we need to be sure the data is encrypted and goes only to the intended CAS servers.

There are sophisticated solutions based on Kerberos or GSSAPI. However, they add considerable new complexity to the code. At the same time, we do not want to introduce anything substantially new because then it has to pass a new security review. So CushyTicketRegistry approaches security by using the existing technology CAS already uses, just applied in a new way.

CAS is based on SSL and uses the X.509 Certificate of the CAS server to verify the identity of machines. If that is good enough to identity a CAS server to the client and to the application that uses CAS, then it should be good enough to identity one CAS server to another.

CAS uses the Service Ticket as a one time randomly generated temporary password. It is large enough that you cannot guess it nor can you brute force match it in the short period of time it remains valid before it times out. The ticket is added onto the end of a URL with the "ticket=..." parameter, and the URL and all the other data in the exchange is encrypted with SSL.

Now apply the same design to CushyTicketRegistry.

Each time a node generates a new full checkpoint file it uses the standard Service Ticket ID generation code to generate a new Service Ticket ID. This ticket id serves in place of a password to fetch files from that node until the next full checkpoint. When a node generates a checkpoint it calls the "https://servername/cas/cluster/notify?ticket=..." URL on the other nodes in the cluster passing this generated dummy Service Ticket ID. SSL validates the X.509 Certificate on the other CAS server before it lets this request pass through, so the ticketid is encrypted and can only go to the real named server at the URL configured to CAS when it starts up.

When a node gets a /cluster/notify request from another node, it responds with an "https://servername/cas/cluster/getChekpoint?ticket=..." request to obtain a copy of the newly generated full checkpoint file. Again, SSL encrypts the data and the other node X.509 certificate validates its identity. If the other node sends the data as requested, then the Service Ticket ID sent in the notify is valid and it is stored in the secondary YaleServiceRegistry object associated with that node. Between checkpoints the same ticketId is used as a password to fetch incremental files, but when the next checkpoint is generated there is a new Notify with a new ticketid and the old ticketid is no longer valid. There is not enough time to brute force the ticketid before it expires and you have to start over.

Behavior

Normal Operation

A CAS node starts up.The Spring configuration loads the primary YaleTicketRepository object, and it creates secondary objects for all the other configured nodes. Each object is configured with a node name, and secondary objects are configured with the external node URL.

CAS will have taken a final checkpoint if it shutdown normally. If it crashed, there should be a last checkpoint and may be a last incremental file. The tickets in these files are restored to memory so CAS is restored to the state it was last in before the crash or shutdown. This is a "warm start".

However, if you are upgrading from one version of CAS to another with incompatible Ticket classes, or you want to start a clean slate after some serious outage, then you can manually delete the checkpoint file and CAS will come up with an empty Ticket Registry. This is a "cold start". It makes no sense to cold start a single node, so typically if you do this you intend to cold start all the CAS nodes. Since each CAS node "owns" its registry, you could cold start one at a time and as each node comes up it will checkpoint its empty registry and replicate it to the other nodes. However, in most cases you will want to reboot all the CAS nodes nearly simultaneously. To let this occur with the least confusion, after a cold start CAS enters a "Quiet Period" where it neither sends nor receives files to or from other nodes. The default is 10 minutes, and that should be enough time to reboot all the servers.

During normal processing all the CAS servers are generating checkpoint and incremental files and they are exchanging these files over the network. The file exchange is required because you never know when a node is going to fail. However, once the file has been transmitted, the tickets in the file are not actually needed if the front end is routing requests properly and the other nodes are up. So during the 99.9% of the time when there is no failure, CAS saves a small amount of processing time by waiting until there is an actual request (after a node failure) that requires access to tickets from another node before it deserializes the data in the file. This is an optimization called "Just In Time Deserialization".

Note: This is a violation of the rule to favor simplicity over efficiency. It was added to the code because it just seemed embarrassing to be constantly reading objects from files when nobody needs the data. However, the author intends to stop with just this one optimization and avoid in the future adding any additional complexity to make things run faster.

A CAS node will start to get requests belonging to another node if the Front End thinks the other node is down (mostly because it cannot contact it). However, if the failure is caused by a single switch or router between the Front End and the other node, then other CAS node may be able to talk to the node even though the Front End cannot get to it. So CushyTicketRegistry separates two switches. The "Just In Time Deserialization" tracks whether the node is getting requests from the Front End for another node. Separately, Cushy maintains a "node is healthy" flag in the secondary object for the node which is set to be "unhealthy" if there is a connection or an I/O error trying to read a checkpoint or incremental file from the node.

Note: Ok, so this is another violation of the simplicity rule. It seemed to be stupid while a node is down to just keep issuing an HTTPS request to the node every 10 seconds until it comes back up, and have each such request end in a connection timeout exception. When the node comes back up again it will send a Notify to every other node in the cluster. If the node was never really down and there was just a network glitch, then it will send a Notify with the next checkpoint in the next 5 minutes or so. Either way, after an HTTP GET fails for a file from another node, waiting for a Notify to verify health before restarting the reads makes sense. But I promise to stop optimizing code here.

Notify is in part an "I am up and functioning" message as well as an "I have a new checkpoint" message. The first thing a node does after booting up is to send a new Notify to all the other nodes. If there is a temporary network failure between nodes, then other activity may stop but the nodes will all try to send a Notify with each new checkpoint (say every 5 minutes) trying to reestablish contacts.

Getting a Notify from a node and reading its new checkpoint file clears the flag that says that tickets have been "just in time" deserialized and that the node is unhealthy. It provides an opportunity, if nothing else is wrong anywhere, for things to go back to complete normal behavior (at least for that node). If more requests arrive then the Just In Time Deserialization happens again, and if network I/O errors reappear then the node will be marked unhealthy again, but after a Notify we give a node a chance to start a clean slate.

Note: The UnitTest flag turns off all real network I/O. So if you call the processNotify() method from a Junit test case it will reset all the flags but will not actually try and generate the HTTP GET to read the checkpoint from the other node, because in unit tests there is no other node.

Node Failure

Detecting a node failure is the job of the Front End. CAS discovers a failure when a CAS node receives a request that should have been routed to another node. The tickets for that node are restored into the Secondary Registry for that node.

Anyone who signed in to the failed node in the last few seconds will lose his TGT. Any Service Ticket issued but not validated by the failed node will be lost and validation requests will fail. The Cushy design is to support the 99.99% of traffic that deals with people who logged in longer than 10 seconds ago.

New logins have no node affiliation and therefore nothing to do with node failure.

During node failure, the three interesting activities are:

  1. Issuing and validating a new Service Ticket on behalf of a TGT owned by another node.
  2. Issuing a new Proxy Ticket connected to a TGT owned by another node.
  3. Logging a user off if his TGT is owned by another node.

In the first two cases, the current node creates a new Ticket. The Ticket is owned by this node even if it points to a Granting Ticket that is in the Registry of another node. The Ticket gets the local node suffix and is put in the local (Primary) CushyTicketRegistry. The Front End will route all requests for this ticket to this node. The Business Logic layer of CAS does not know that the TGT belongs to another node because the Business Logic layer is used to all the other TicketRegistries where all the tickets are jumbled up together in a big common collection. So this is business as usual.

There is one consequence that should be understood. Although the TGT is currently in the Secondary Registry, that collection of tickets is logically and perhaps physically replaced when the node comes back up, issues a Notify, and a new checkpoint is received. At that point the ST (and more importantly the PGT because it lives longer) will point to the same sort of "private copy of a TGT that is a point in time snapshot of the login status when the secondary ticket was created" that you get all the time when ST and PGT objects are serialized and transmitted between nodes by any of the "cache" replication technologies. Cushy has been able up to this point to avoid unconnected private copies of TGT's, but it cannot do so across a node failure and restart.

This brings us to Logoff. Not many people logoff from CAS. When they do, the Business Logic layer of CAS will try to handle Single Sign Out by notifying all the applications that registered a logoff URL that the user has logged out. Again, since the Business Layer works fine in existing "cache" based object replication systems, the fact that Cushy is holding the TGT in a Secondary object has absolutely no effect on the processing. The only difference occurs when the Business Logic goes to delete the TGT.

The problem here is that we don't own the TGT. The other node owns it. Furthermore, the other node probably has a copy of it in its last checkpoint file, and as soon as it starts up it will restore that file to memory including this TGT. So while we could delete the object in the Secondary Registry, it is just going to come back again later on.

This probably doesn't matter. The cookie has been deleted in the browser. Any Single Sign Out processing has been done. The TGT may sit around all day unused, and then eventually it times out. At this point we get the only actual difference in behavior. When it times out the Business Logic is going to repeat the Single Sign Out processing. It is almost inconceivable that any application would be written in such a way that it would notice or care if it gets a second logout message for someone who already logged out, but it has to be noted.

Node Recovery

...

At some point the front end notices the node is back and starts routing requests to it based on the node name in the suffix of CAS Cookies. The node picks up where it left off. It does not know and can not learn about any Service Tickets issued on behalf of its logged in users by other nodes during the failure. It does not know about users who logged out of CAS during the failure.

Every time the node generates a new checkpoint and issues another Notify, the other nodes clear any flags indicating failover status and attempt to go back to normal processing. This may not happen the first time if the Front End takes a while to react. but if not the first then probably the second Notify will return the entire cluster to normal processing.

JUnit Testing

It is unusual for JUnit test cases to get their own documentation. Testing a cluster on a single machine without a Web server is complicated enough that the strategies require some documentation.

If you create an instance of CushyTicketRegistry without any parameters, it believes that it is a Primary object. You can then set properties and simulate Spring configuration. There is an alternate constructor with four parameters that is used only from test cases.

The trick here is to create two Primary CusyTicketRegistry instances with two compatible but opposite configurations. Typically one Primary object believes that it is node "casvm01" and that the cluster consists of a second node named "casvm02", while the other Primary object believes that it is node "casvm02" in a cluster with "casvm01".

The next thing you need is to make sure that both objects are using the same work directory. That way the first object will create a checkpoint file named "casvm01" and the other will create a checkpoint file named "casvm02".

...

  • Recover tickets after reboot without JPA, or a separate server, or a cluster (works on a standalone server)
  • Recover tickets after a crash, except for the last few seconds of activity that did not get to disk.
  • No dependency on any large external libraries. Pure Java using only the standard Java SE runtime and some Apache commons stuff.
  • All source in one class. A Java programmer can read it and understand it.
  • Can also be used to cluster CAS servers
  • Cannot crash CAS ever, no matter what is wrong with the network or other servers.
  • A completely different and simpler approach to the TicketRegistry. Easier to work with and extend.
  • Probably uses more CPU and network I/O than other TicketRegistry solutions, but it has a constant predictable overhead you can verify is trivial.

CAS is a Single SignOn solution. Internally the function of CAS is to create, update, and delete a set of objects it calls "Tickets" (a word borrowed from Kerberos). A Logon Ticket (TGT) object is created to hold the Netid when a user logs on to CAS. A partially random string is generated to be the login ticket-id and is sent back to the browser as a Cookie and is also used as a "key" to locate the logon ticket object in a table. Similarly, CAS creates Service Tickets (ST) to identity a user to an application that uses CAS authentication.

CAS stores its tickets in a plug-in selectable component called a TicketRegistry. CAS provides one implementation of the TicketRegistry for single-server configurations, and at least four alternatives that can be used to share tickets among several CAS servers operating in a network cluster. This document describes a new implementation called CushyTicketRegistry that is simple, provides added function for the standalone server, and yet also operates in clustered configurations.

Four years ago Yale implemented a "High Availability" CAS cluster using JBoss Cache to replicate tickets. After that, the only CAS crashes were caused by failures of JBoss Cache. Red Hat failed to diagnose or fix the problem. As we tried to diagnose the problem ourselves we discovered both bugs and design problems in the structure of Ticket objects and the use of the TicketRegistry solutions that contributed to the failure. We considered replacing JBoss Cache with Ehcache, but while that might improve reliability somewhat it would not solve the fundamental structural problems.

Having been burned by software so complicated that the configuration files were almost impossible to understand, Cushy was developed to accomplish the same thing in a way so simple it could not possibly fail.

The existing CAS TicketRegistry solutions must be configured to replicate tickets to the other nodes and to wait for this activity to complete, so that any node can validate a Service Ticket that was just generated a few milliseconds ago. Waiting for the replication to complete is what makes CAS vulnerable to a crash if the replication begins but never completes. Synchronous ticket replication is a standard service provided by JBoss Cache and Ehcache, but is it the right way to solve the Service Ticket validation problem? A few minutes spent crunching the math suggested there was a better way.

It is easier and more efficient to send the request to the node that already has the ticket and can process it rather than struggling to get the ticket to every other node in advance of the next request.

In the current TicketRegistry implementations, any request in a cluster to create a Service Ticket must replicate the service ticket to at least one other computer (the database server in JPA, one or more nodes using Ehcache or any other ticket replication mechanism) before the Service Ticket ID is returned to the browser. This ensures that the Service Ticket can be validated by any node to which the application's validation request is directed. After validation, there is a second network transaction to delete the ticket. So every ST involves two backend synchronous operations.

However, it has always been part of CAS that every ticketid has a suffix that, at least on paper, can contain the node name of the CAS server that created the ticket. Using this feature in practice requires some node configuration methodology. Once this is done, then any validate request (for example, any call to /cas/serviceValidate contains in the query string part of the URL a ticket= parameter, and the end of the value of that parameter designates the node that created the ticket. Today you can program most modern network front end devices to extract this information from the request and route the validate request to the node that created the ticket and is guaranteed to have it in memory. If you cannot program your front end device, or if you cannot convince your network administrators to do the work for you, then CushyFrontEndFilter accomplishes the same thing by scanning requests as they arrive at a CAS server and forwarding requests like validation to the server that created the ticket. If you have two servers and requests are randomly assigned to them, then 50% of the time the request goes to the right server and there is no network transaction, and 50% of the time the request has to be forwarded by the Filter to the other server, which then validates the ST and deletes it returning the response message. So with the Filter you expect, on average, one network transaction half the time instead of, with current JPA or Cache technology, two network transactions every time. When the number of nodes in the cluster is more than 2, the Filter works even better.

CushyFrontEndFilter works with Ehcache or CushyTicketRegistry. When added to Ehcache you can change the cache configuration so that the Service Ticket cache does not use synchronous replication, or even better you can turn off replication entirely for the Service Ticket cache because every 10 seconds a Service Ticket is either used and discarded or else times out, so it makes no sense to replicate them at all if the front end or filter routes requests properly.

However, once you come up with the idea of using front end routing to avoid the synchronous ticket replication (which was the source of crashes in JBoss Cache at Yale), then some new more radical changes to TicketRegistry become possible. In addition to the various validate request, you can route the /proxy request to the node that owns the Proxy Granting Ticket, and you can route new Service Ticket requests to the node that issued the Ticket Granting Ticket (based on the suffix of the CASTGC cookie). Now a basic principle of all the existing ticket registry designs is no longer necessary. CAS Ticket objects do not have to be stored in what appears to be a common shared pool. Tickets can be segregated into separate collections based on the identity of the node that created and "owns" the ticket.

"Cushy" stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale". This summarizes what it is and how it works.

For objects to be replicated from one node to another, libraries use the Java writeObject statement to "Serialize" the object to a stream of bytes that can be transmitted over the network and then restored in the receiving JVM. Ehcache and JBoss Cache use writeObject on individual tickets (although it turns out they also end up serializing copies of all the other objects the ticket points to, including the TGT when attempting to replicate a ST). However, writeObject can operate just as well on the entire TicketRegistry. Making a "checkpoint" copy of the entire collection of tickets to disk (at shutdown for example) and then restoring this collection (after a restart) is very simple to code. Since Java does all the work, it is guaranteed to behave correctly. It is a useful additional function. However, you can be more aggressive in the use of this approach, and that suggests the design of an entirely different type of TicketRegistry.

Start with the DefaultTicketRegistry source that CAS uses to hold tickets in memory on a single CAS standalone server. Then add the writeObject statement (surrounded by the code to open and close the file) to create a checkpoint copy of all the tickets, and a corresponding readObject and surrounding code to restore the tickets to memory. The first thought was to do the writeObject to a network socket, because that was what all the other TicketRegistry implementations were doing. Then it became clear that it was simpler, and more generally useful, and a safer design, if the data was first written to a local disk file. The disk file could then optionally be transmitted over the network in a completely independent operation. Going first to disk created code that was useful for both standalone and clustered CAS servers, and it guaranteed that the network operations were completely separated from the Ticket objects and therefore the basic CAS function.

The first benchmarks turned out to be even better than had been expected, and that justified further work on the system.

CushyTicketRegistry and the Standalone Server

For a single CAS server, the standard choice is the DefaultTicketRegistry class which keeps the tickets in an in-memory Java table keyed by the ticket id string. Suppose you change the name of the Java class in the Spring ticketRegistry.xml file from DefaultTicketRegistry to CushyTicketRegistry (and add a few required parameters described later). Cushy was based on the DefaultTicketRegistry source code, so everything works the same as it did before, until you have to restart CAS for any reason. Since the DefaultTicketRegistry only has an in memory table, all the ticket objects are lost when CAS restarts and users all have to login again. Cushy detects the shutdown and using a single Java writeObject statement it saves all the ticket objects in the Registry to a file on disk (called the "checkpoint" file). When CAS restarts, Cushy reloads all the tickets from that file into memory and restores all the CAS state from before the shutdown. No user even notices that CAS restarted unless they tried to access CAS during the restart.

The number of tickets CAS holds grows during the day and shrinks over night. At Yale there are fewer than 20,000 ticket objects in CAS memory, and Cushy can write all those tickets to disk in less than a second generating a file around 3 megabytes in size. Other numbers of tickets scale proportionately (you can run a JUnit test and generate your own numbers). This is such a small amount of overhead that Cushy can be proactive.

CAS is a very important application, but on modern hardware it is awfully small and cheap to run. Since it was first developed there have been at least 5 generations of new chip technology that now run what was never a big application to begin with.

So to take the next logical step, start with the previous ticketRegistry.xml configuration and duplicate the XML elements that currently call a function in the RegistryCleaner every few minutes. In the new copy of the XML elements, call the "timerDriven" function in the (Cushy)ticketRegistry bean every few minutes. Now Cushy will not wait for shutdown but will back up the ticket objects regularly just in case the CAS machine crashes without shutting down normally. When CAS restarts after a crash, it can load a fairly current copy of the ticket objects which will satisfy the 99.9% of the users who did not login in the last minutes before the crash.

The next step should be obvious. Can we turn "last few minutes" into "last few seconds". You could create a full checkpoint of all the tickets every few seconds, but now the overhead becomes significant. So go back to ticketRegistry.xml and set the parameters to call the "timerDriven" function every 10 seconds, but set the "checkpointInterval" parameter on the CushyTicketRegistry object to only create a new checkpoint file every 300 seconds. Now Cushy creates the checkpoint file, and then the next 29 times it is called by the timer it generates an "incremental" file containing only the changes since the checkpoint was written. Incremental files are cumulative, so there is only one file, not 29 separate files. If CAS crashes and restarts, Cushy reads the last checkpoint, then applies the changes in the last incremental, and now it has all the tickets up to the last 10 seconds before the crash. That satisfies 99.99% of the users and it is probably a good place to quit.

What about disaster recovery? The checkpoint and incremental files are ordinary sequential binary files on disk. When Cushy writes a new file it creates a temporary name, fills the file with new data, closes it, and then swaps the new for the old file, so other programs authorized to access the directory can safely open or copy the files while CAS is running. Feel free to write a shell script or Pearl or Python program to use SFTP or any other program or protocol to back up the data offsite or to the cloud.

Some people use JPATicketRegistry and store a copy of the tickets in a database to accomplish the same single server restart capability that Cushy provides. If you are happy with that solution, stick with it. Cushy doesn't require the database, it doesn't require JPA, and it may be easier to work with.

Before you configure a cluster, remember that today a server is typically a virtual machine that is not bound to any particular physical hardware. Ten years ago moving a service to a backup machine involved manual work that took time. Today there is VM infrastructure and automated monitoring and control tools. A failed server can be migrated and restarted automatically or with a few commands. If you can get the CAS server restarted fast enough that almost nobody notices, then you have solved the problem that clustering was originally designed to solve without adding a second running node.

You may still want a cluster.

CushyClusterConfiguration

If you use the JPATicketRegistry, then you configure CAS to know about the database in which tickets are stored. None of the nodes knows about the cluster as a whole. The "cluster" is simply one or more CAS servers all configured to backup tickets into the same database.

If you use Ehcache or one of the other object replication "cache" technologies, then there is typically an option to use an automatic node discovery mechanism based on multicast messages. That would be a good solution if you have only the one production CAS cluster, but it becomes harder to configure if you have separate Test and Development clusters that have to have their own multicast configuration.

It seems to be more reliable to configure each node to know the name and URL of all the other machines in the same cluster. However, a node specific configuration file on each machine is difficult to maintain and install. You do not want to change the CAS WAR file when you distribute it to each machine, and Production Services wants to churn out identical server VMs with minimal differences.

In the 1980's before the internet, 500 universities worldwide were connected by BITNET. The technology required a specific local configuration file for each campus, but maintaining 500 different configurations was impossible. So they created a single global file that defined the entire network from no specific point of vew, and a utility program that, given the identity of a campus somewhere in the network, could translate that global file to the configuration data that campus needed to install to participate in the network. CushyClusterConfiguration does the same thing for your global definition of many CAS clusters.

CushyClusterConfiguration (CCC) provides an alternative approach to cluster configuration, and while it was originally designed for CushyTicketRegistry it also works for Ehcache. Instead of defining the point of view of each individual machine, the administrator defines all of the CAS servers in all of the clusters in the organization. Production, Functional Test, Load Test, Integration Test, down to the developers desktop or laptop "Sandbox" machines.

CCC is a Spring Bean that is specified in the CAS Spring XML. It only has a function during initialization. It reads in the complete set of clusters, uses DNS (or the hosts file) to obtain information about each CAS machine referenced in the configuration, it uses Java to determine the IP addresses assigned to the current machine, and then it tries to match one of the configured machines to the current computer. When it finds a match, then that configuration defines this CAS, and the other machines in the same cluster definition can be used to manually configure Ehcache or CushyTicketRegistry.

CCC exports the information it has gathered and the decisions it has made by defining a number of properties that can be referenced using the "Spring EL" language in the configuration of properties and constructor arguments for other Beans. This obviously includes the TicketRegistry, but the ticketSuffix property can also be used to define a node specific value at the end of the unique ticketids generated by beans configured by the uniqueIdGenerators.xml file.

There is a separate page to explain the design and syntax of CCC.

Front End or CushyFrontEndFilter

Front End devices know many protocols and a few common server conventions. For everything else they expose a simple programming language. The Filter contains the same logic written in Java.

We begin by assuming that the CAS cluster has been configured by CushyClusterConfiguration or its equivalent, and that one part of configuring the cluster was to create a unique ticket suffix for every node and feed that value to the beans configured in the uniqueIdGenerators.xml file.

After login, the other CAS requests all operate on tickets. They generate Service Tickets and Proxy Granting Tickets, validate tickets, and so on. The first step is to find the ticket that is important to this request. There are only three places to find the ticketid that defines an operation:

  1. In the ticket= parameter at the end of the URL for validation requests.
  2. In the pgt= parameter for a proxy request.
  3. In the CASTGC Cookie for browser requests.

A validate request is identified by having a particular "servletPath" value ("/validate", "/serviceValidate, "proxyValidate", "/samlValidate"). The Proxy request has a different path ("/proxy"). Service Ticket create requests come from a browser that has a CASTGC cookie. If none of the servletPath values match and there is no cookie, then this request is not related to a particular ticket and can be handled by any CAS server.

If you program this into the Front End, then the request goes directly to the right server without any additional overhead. With only the Filter, a request goes to some randomly chosen CAS Server which may have to forward the request to another server, forward back the response, and handle failure if the preferred server goes down.

There is a separate page to describe Front End programming for CAS.

CushyTicketRegistry and a CAS Cluster

Picking back up where we left off from the Standalone Server discussion, the names of each checkpoint and incremental files are created from the unique node names each server in the cluster, so they can all coexist in the same disk directory. The simplest Cushy communication option is "SharedDisk". When this is chosen, Cushy expects that the other nodes are writing their full backup and incremental files to the same disk directory it is using. If Cushy receives a request that the Front End should have sent to another node, then Cushy assumes some node or network failure has occurred, loads the other node's tickets into memory from its last checkpoint and incremental file in the shared directory, and then processes the request on behalf of the other node.

Of course you are free to implement SharedDisk with an actual file server or NAS, but technically Cushy doesn't know or care how the files got to the hard drive. So if you don't like real shared disk technology, you can write a shell script somewhere to wake up every 10 seconds copy the files between machines using SFTP or whatever file transfer mechanism you like to use. You could also put the 3 megabyte files on the Enterprise Service Bus if you prefer architecture to simplicity.

SharedDisk is not the preferred Cushy communication mechanism. Cushy is, after all, part of CAS where the obvious example of communication between computers is the Service Ticket validation request. Issue an HTTPS GET to /cas/serviceValidate with a ServiceTicket and get back a bunch of XML that describes the user. So with Cushy, one node can issue a HTTPS GET to /cas/cluster/getCheckpoint on another node and it gets back the current checkpoint file for that CAS server.

Obviously you need security for this important data. CAS security is based on short term securely generated Login and Service Tickets. So every time CAS generates a new checkpoint file it also generates a new "dummyServiceTicketId" that controls access to that checkpoint file and all the incrementals generated until there is a new checkpoint. So the full request is "/cas/cluster/getCheckpoint?ticket=..."  where the dummyServiceTicketId is appended to the end.

How do the other nodes get the dummyServiceTicketId securely? Here we borrow a trick from the CAS Proxy Callback. Each CAS node is a Web server with an SSL Certificate to prove its identity. So when a node generates a new checkpoint file, and a new dummyServiceTicketId, it issues an HTTPS GET to all the other configured CAS nodes using URL
/cas/cluster/notify?nodename=callernodename&ticket=(dummyServiceTicketId).

Thanks to https: this request will not transmit the parameters unless the server first proves its identity with its SSL Certificate. Then the request is sent encrypted so the dummyServiceTicketId is protected. Although this is a GET, there is no response. It is essentially a "restful" Web Service request that sends data as parameters.

Notify does three things:

  1. It tells the other node there is a new checkpoint ready to pick up immediately
  2. It securely provides the other node with the dummyServiceTicketId needed to read files for the next few minutes.
  3. It is a general declaration that the node is up and healthy. When a node starts up it sends its first /cluster/notify to all nodes with the &reboot=yes parameter to announce that it is live again.

Notify is only done every few minutes when there is a new checkpoint. Incrementals are generated all the time, but they are not announced. Each server is free to poll the other servers periodically to fetch the most recent incremental with the /cas/cluster/getIncremental request (add the dummyServiceTicketId to prove you are authorized to read the data).

CAS is a high security application, but it always has been. The best way to avoid introducing a security problem is to model the design of each new feature on something CAS already does, and then just do it the same way.

Since these node to node communication calls are modeled on existing CAS Service Ticket validation and Proxy Callback requests, they are configured into CAS in the same place (in the Spring MVC configuration, details provided below).

Are You Prepared?

Everything that can go wrong will go wrong. We plan for hardware and software failure, network failure, and disaster recovery. To do this we need to know how things will fail and how they will recover from each type of problem.

JPA is pretty straight forward. CAS depends on a database. To plan for CAS availability, you have to plan for database availability. At this point you have not actually solved any problem, but you have redefined it from a CAS issue to a database issue. Of course there is now an additional box involved, and you now have to look at network failures between the CAS servers and the database. However, now the CAS programmers can dump the entire thing on the DBAs and maybe they will figure it out. Unfortunately, you are probably not their most important customer when it comes to planning recovery.

The other CAS clustering techniques (JBoss Cache, Ehcache, Memcached) are typically regarded as magic off the shelf software products that take care of all your problems automatically and you don't have to worry about them. Again you haven't actually solved the problem, but now you really have transformed it into something you will never understand and so you just have to cross your fingers and hope those guys know what they are doing.

Even it you do not understand Java programming, CushyTicketRegistry performs a sequence of steps described here that you can understand. It writes a file on disk, and from that point on everything is file transfer. You can use the built-in Web support, or replace it with something else. From that point on every type of node failure or network failure produces predictable behavior. Since the file transfer is being retried periodically, every type of hardware recovery also produces predictable results. This is something you can understand and take into consideration when you plan out the scenarios.

Why another Clustering Mechanism?

You can use JPA, but CAS doesn't really have a database problem.

  • CAS tickets all timeout after a number of hours. They have no need for long term persistence.
  • There are no meaningful SQL operations in CAS. Nobody will generate reports based on tickets.
  • CAS has no transactional structure or need for a conventional commit operation.

JPA also weaves its own generated code into the methods exposed by the objects it manages. This causes the application (CAS) to fail in unpredictable and unavoidable ways if the database goes down or if network access to the database is interrupted.

There are a number of non-database central object server technologies available. There are no existing CAS TicketRegistry implementations for any of them, and the central server remains a problem.

JBoss Cache has proven unreliable, and it is terribly complex to configure with multicast addresses and complex network timeout and other parameters.

Ehcache appears to be the most commonly used CAS replication technology. It is fairly simple to configure, and it uses RMI calls to transmit tickets, a built in Java technology that is about as simple as Cushy HTTP. It can store tickets on local disk. It is the obvious alternative to CushyTicketRegistry and deserves special consideration.

Ehcache Compared to CushyTicketRegistry

CushyClusterConfiguration will configure either EhcacheTicketRegistry or CushyTicketRegistry, so it is certainly no easier to configure one or the other.

CushyFrontEndFilter works for both Ehcache and CushyTicketRegistry, so any benefits there can apply equally to both systems if you reconfigure Ehcache to exploit them.

With Front End support, every 10 seconds or so Ehcache replicates all the tickets that have changed in the last 10 seconds, while Cushy transmits a file with all of the ticket changes since the last full checkpoint. Then every few minutes Cushy generates a full checkpoint that Ehcache does not use. So Ehcache transmits a lot less data.

Ehcache uses RMI and does not seem to have any security, so it depends on the network Firewall and the good behavior of other computers in the machine room. Cushy encrypts data and verifies the identity of machines, so it cannot be attacked even from inside the Firewall.

Cushy generates regular files on disk that can be copied using any standard commands, scripts, or utilities. This provides new disaster recovery options.

Ehcache is designed to be a "cache". That is, it is designed to be a high speed, in memory copy of some data that has a persistent authoritative source on some server. That is why it has a lot of configuration for "LRU" and object eviction, because it assumes that lost objects are reloaded from persistent storage. You can use it as a replicated in memory table, but you have to understand if you read the documentation that that is not its original design. Cushy is specifically designed to be a CAS TicketRegistry.

Cushy models its design on two 40 year old concepts. A common strategy for backing disks up to tape was to do a full backup of all the files once a week, and then during the week to do an incremental backup of the files changed since the last backup. The term "checkpoint" derives from a disk file into which an application saved all its important data periodically so it could restore that data an pick up where it left off after a system crash. These strategies work because they are too simple to fail. More sophisticated algorithms may accomplish the same result with less processing and I/O, but the more complex the logic the more vulnerable you become if the software, or hardware, or network failure occurs in a way that the complex sophisticated software did not anticipate.

Ehcache is a large library of complex code designed to merge changes to shared data across multiple hosts. Cushy is a single source file of pure Java written to be easily understood.

Replicating the entire TicketRegistry instead of just replicating individual tickets is less efficient. The amount of overhead is predictable and you can verify that the extra overhead is trivial. However, remember this is simply the original Cushy 1.0 design which was written to prove a point and is aggressively "in your face" pushing the idea of "simplicity over efficiency". After we nail down all the loose ends, it is possible to add a bit of extra optimization to get arbitrarily close to Ehcache in terms of efficiency.

Ticket Chains (and Test Cases)

A TGT represents a logged on user. It is called a Ticket Granting Ticket because it is used to create Service and Proxy tickets. It has no parent and stands alone.

When a user requests it, CAS uses the TGT to create a Service Ticket. The ST points to the TGT that created it, so when the application validates the ST id string, CAS can follow the chain from the ST to the TGT to get the Netid and attributes to return to the application. Then the ST is discarded.

However, when a middleware application like a Portal supports CAS Proxy protocol, the CAS Business Logic layer trades an ST (pointing to a TGT) in and turns it into a second type of TGT (the Proxy Granting Ticket or PGT). The term "PGT" exists only in documents like this. Internally CAS just creates a second TGT that points to the login TGT.

If the Proxy application accesses a backend application, it calls the /proxy service passing the TGT ID and gets back a Service Ticket ID. That ST points to the PGT that points to the TGT from which CAS can find the Netid.

So when you are thinking about Ticket Registries, or when you are designing JUnit test cases, there are four basic arrangements to consider:

  1. a TGT
  2. a ST pointing to a TGT
  3. a PGT pointing to a TGT
  4. a ST pointing to a PGT pointing to a TGT

This becomes an outline for various cluster node failure tests. Whenever one ticket points to a parent there is a model where the ticket pointed to was created on a node that failed and the new ticket has to be created on the backup server acting on behalf of that node. So you want to test the creation and validation of a Service Ticket on node B when the TGT was created on node A, or the creation of a PGT on node B when the TGT was created on node A, and so on.

What Cushy Does at Failure

While other TicketRegistry solutions combine tickets from all the nodes, a Cushy cluster operates as a goup of standalone CAS servers. The Front End or the Filter routes requests to the server that can handle them. So when everything is running fine, the TicketRegistry that CAS uses is basically the same as the DefaultTicketRegistry module that works on standalone servers.

So the interesting things occur when one server goes down or when network connectivity is lost between the Front End and a node, or between one node and another.

If a node fails, or the Front End cannot get to it and thinks it has failed, then requests start to arrive at CAS nodes for tickets that they do not own and did not create. File sharing or replication gives every node a copy of the most recent checkpoint and incremental file from that node, but normally the strategy of "Tickets on Request" does not open or process the files until they are needed. So the first request restores all the tickets for the other node to memory under the Secondary TicketRegistry object created at initialization to represent the failed node.

Since the rule is that the other node "owns" its own tickets, you cannot make any permanent changes to the tickets in the Secondary Registry. These tickets will be passed back as needed to the CAS Business Logic layer, and it will make changes as part of its normal processing thinking that the changes it makes are meaningful. In reality, when the other node comes back it will reload its tickets from the point of failure and that will be the authoritative collection representing the state of those tickets. In practice this doesn't actually matter.

If CAS on this node creates a new Service Ticket or Proxy Granting Ticket related to a Login TGT created originally by the other node, then: The new Ticket belongs to the node that created it and that node identifier is added to the end of the ticket ID. So the new ST is owned by and is validated by this node even though the Login TGT used to create it comes from the Secondary Registry of the failed node.

Service Tickets are created and then in a few milliseconds they are deleted when the application validates them or they time out after a few seconds or minutes. They do not exist long enough to raise any issues.

Proxy Granting Tickets, however, can remain around for hours. So the one long term consequence of a failure is that the login TGT can be on one server, but a PGT can be on a different server that created it while the login server was temporarily unavailable. The PGT ends up with its own private copy of the TGT which is frozen in time at the moment the PGT was created. Remember, this is normal behavior for all existing TicketRegistry solutions and none of the other TicketRegistry options will ever "fix" this situation. At least Cushy is aware of the problem and with a few fixes to the Ticket classes Cushy 2.0 might be able to do better.

There is also an issue with Single Sign Out. If a user logs out during a failure of his login server, then a backup server processes the Single Log Out normally. Then when the login server is restored to operation, the Login TGT is restored from the checkpoint file into memory. Of course, no browser now has a Cookie pointing to that ticket, so it sits unused all day and then in the evening it times out and a second Single Sign Out process is triggered and all the applications that previously were told the user logged out are not contacted a second time with the same logout information. It is almost unimaginable that any application would be written so badly it would care about this, but it should be mentioned.

While the login server is down, new Service Tickets can be issued, but they cannot be meaningfully added to the "services" table in the TGT of the machine that is down.When that machine comes back up it resumes controlling the old TGT of the logged in user, and when the user logs off the Single Sign Out processing will occur only for servers that that machine knows about, and will omit services to which the user connected while the server that owned the TGT was down. Cushy provides a "best effort" Single Sign Out experience, and Cushy 1.0 cannot do better than this.

There are a few types of network failure that work differently from node failure.

If one CAS node is unable to connect to another CAS node for a while, even though the other node is up, then it marks the other node as being "unhealthy" and waits patiently for the other node to send a /cluster/notify. The other node will send a Notify every time it generates a new Checkpoint, and when one of those Notify messages gets through then the two nodes will reestablish communication.

If the Front End is unable to get to a CAS Node, but the other server can get to it, then what happens next depends on whether the CushyFrontEndFilter is also installed. Having both the programmed Front End and also the Filter is a bit like suspenders and a belt, but if the Front End is doing its job then the Filter has nothing to do. However, in this particular case the Filter will see a request for a ticket owned by another node and will attempt to forward it to the node indicated in the request. If it succeeds then CAS has automatically routed traffic around the point of failure. However, remember that if the node actually goes down then there will be two connect timeout delays, one where the Front End determines the node is down and then a second where the Filter verifies that it is down.

Without the Filter then the current node receives a request for a ticket it does not own, loads tickets into its Secondary Registry for that node, and processes the request. What is different is that if the node is really up and the two nodes can connect, then this CAS node will continue to receive Notify requests and new checkpoint and incremental files from the other node even as it is also processing requests for that node sent to it by the Front End. Cushy is designed to handle this situation (because even in a normal failure the other node can come up just as you are in the middle of handling a request for it).

Configuration

In CAS the TicketRegisty is configured using the WEB-INF/spring-configuration/ticketRegistry.xml file.

In the standard file, a bean with id="ticketRegistry" is configured selecting the class name of one of the optional TicketRegistry implementations (JBoss Cache, Ehcache, ...). To use Cushy you configure the CushyTicketRegistry class and its particular parameters.

Then at the end there are a group of bean definitions that set up periodic timer driven operations using the Spring support for the Quartz timer library. Normally these beans set up the RegistryCleaner to wake up periodically and remove all the expired tickets from the Registry.

Cushy adds a new bean at the beginning. This is an optional bean for class CushyClusterConfiguration that uses some static configuration information and runtime Java logic to find the IP addresses and hostname of the current computer to select a specific cluster configuration and generate property values that can be passed on to the CushyTicketRegistry bean. If this class does not do what you want, you can alter it, replace it, or just generate static configuration for the CushyTicketRegistry bean.

Then add a second timer driven operation to the end of the file to call the "timerDriven" method of the CushyTicketRegistry object on a regular basis (say once every 10 seconds) to trigger writing the checkpoint and incremental files.

There is a separate page that describes CushyClusterConfiguration in detail.

 

You Can Configure Manually

Since CushyClusterConfiguration only generates strings and Property tables that are used by CushyTicketRegistry, if you prefer you can generate those strings and tables manually in the CAS configuration file for each server.

Other Parameters

Typically in the ticketRegistry.xml Spring configuration file you configure CushyClusterConfiguration as a bean with id="clusterConfiguration" first, and then configure the usual id="ticketRegistry" using CusyTicketRegistry. The clusterConfiguration bean exports some properties that are used (through Spring EL) to configure the Registry bean.

  <bean id="ticketRegistry" class="edu.yale.cas.ticket.registry.CushyTicketRegistry"
          p:serviceTicketIdGenerator-ref="serviceTicketUniqueIdGenerator"
          p:checkpointInterval="300"
          p:cacheDirectory=  "#{systemProperties['jboss.server.data.dir']}/cas"
          p:nodeName=        "#{clusterConfiguration.getNodeName()}"
          p:nodeNameToUrl=   "#{clusterConfiguration.getNodeNameToUrl()}"
          p:suffixToNodeName="#{clusterConfiguration.getSuffixToNodeName()}"  />

 The nodeName, nodeNameToUrl, and suffixToNodeName parameters link back to properties generated as a result of the logic in the CushyClusterConfiguration bean.

The cacheDirectory is a work directory on disk to which it has read/write privileges. The default is "/var/cache/cas" which is Unix syntax but can be created as a directory structure on Windows. In this example we use the Java system property for the JBoss /data subdirectory when running CAS on JBoss.

The checkpointInterval is the time in seconds between successive full checkpoints. Between checkpoints, incremental files will be generated.

CushyClusterConfiguration exposes a md5Suffix="yes" parameter which causes it to generate a ticketSuffix that is the MD5 hash of the computer host instead of using the nodename as a suffix. The F5 likes to refer to computers by their MD5 hash and using that as the ticket suffix simplifies the F5 configuration even though it makes the ticket longer.

There are other "properties" that actually turn code options on or off. Internally they are static variable that only appear to be properties of the CushyTicketRegistry class so they can be added to the ticketRegistry.xml file. The alternative would be to make them static values in the source and require you to recompile the source to make a change.

  • p:sharedDisk="true" - disables HTTP communication for JUnit Tests and when the work directory is on a shared disk.
  • p:disableTicketsOnRequest="true" - disables an optimization that only reads tickets from a checkpoint or incremental file the first time the tickets are actually needed.
  • p:excludeSTFromFiles="true" - this is plausibly an option you should use. It prevents Service Tickets from being written to the checkpoint or incremental files. This makes incremental files smaller because it is then not necessary to keep the growing list of ST IDs for all the Service Tickets that were deleted probably before anyone ever really cared about them.
  • p:useThread="true" - use a thread to read the checkpoint file from another CAS node. If not set, the file is read in line and this may slow down the processing of a new checkpoint across all the nodes.

How Often?

"Quartz" is the standard Java library for timer driven events. There are various ways to use Quartz, including annotations in modern containers, but JASIG CAS uses a Spring Bean interface to Quartz where parameters are specified in XML. All the standard JASIG TicketRegistry configurations have contained a Spring Bean configuration that drives the RegistryCleaner to run and delete expired tickets every so often. CushyTicketRegistry requires a second Quartz timer configured in the same file :

    <bean id="jobBackupRegistry" class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"

        p:targetObject-ref="ticketRegistry" p:targetMethod="timerDriven" />

    <bean id="triggerBackupRegistry" class="org.springframework.scheduling.quartz.SimpleTriggerBean"
      p:jobDetail-ref="jobBackupRegistry" p:startDelay="60000" p:repeatInterval="15000" />

The first bean tells Spring to call method "timerDriven" in the object configured with Spring bean name "ticketRegistry". The second bean tells Spring that after the first minute (letting things start up), make the call indicated in the first bean every 15 seconds. Since this is standard Spring stuff, the interval is coded in milliseconds.

The time interval configured here is the time between incrementals. The checkpointInterval parameter on the ticketRegistry bean sets the time (in seconds) between full checkpoints:

p:checkpointInterval="300"

So with these parameters, Cushy writes an incremental every 15 seconds and a checkpoint every 5 minutes. Feel free to set these values as you choose. Shorter intervals mean more overhead, but the cost is already so low that longer intervals don't really save much.

See the sample ticketRegistry.xml file for the complete configuration context.

Special Rules

Cushy stores tickets in an in-memory table. It writes tickets to a disk file with a single writeObject Java statement. It transfers files from machine to machine using an HTTPS GET. So far, everything seems to be rather simple. Cushy started that way, but then it became clear that there were a small number of optimizations that really needed to be made even if they added a slight amount of complexity to the code.

Notify

Once every 5-15 minutes a node generates a new full checkpoint file. It also generates a new dummy ServiceTicketId that acts as the password that other nodes will present to request the files over HTTPS. It then does a "Notify" operation. It generates a HTTPS GET to the /cas/cluster/notify URL on every other CAS node in the cluster. This request is routed by Spring MVC to the CacheNotifyController class provided by the Cushy package. A node also does a Notify immediately after it reboots to inform the other nodes that it is back up and to provide them with the password needed to communicate until the next checkpoint.

The Notify goes to every node in the cluster at its configured URL. The URL is assumed to be "https:" so the SSL Certificate in the other node verifies that it is the correct machine authorized to receive the data.

However, when a node receives what looks like a Notify it cannot verify its source. This is not a big problem because the first order of business is to read the new checkpoint file from the node sending the Notify, and to read the file it uses the configured URL for that node in the cluster definition, and since that URL is "https:" it will only work if the other node has a Certificate proving its identity, and if the other node accepts the secret dummy ServiceTicketId sent in the Notify then the loop has been closed. Both machines communicated over configured URLs. Both verified their identity with a Certificate. All data was encrypted with SSL. The ticket send on the Notify was validated when the checkpoint file was returned correctly.

Because Notify is sent when CAS boots up, it is an indication that the node is "healthy" that resets any flag indicating that the node is "sick". This does not, however, prevent the other nodes from reacting if they continue to receive requests or exceptions indicating a problem. When Cushy gets an indication of a problem it sets a flag. It then continues of operate assuming the problem is still there until it gets a Notify from the node. After the Notify, Cushy does not assume that there is a continuing problem, but it will respond appropriately if one is detected.

In a SharedDisk situation (see below) there is no HTTP and therefore no /cluster/notify call. Instead, the timerDriven routine checks the Last Modified date on the other node's checkpoint file. When it changes, it performs a subset of the full processNotify operations to reset flags and mark the other server healthy.

Tickets on Request

The simplest and therefore the initial logic for Cushy read a checkpoint or incremental file from another node and immediately "deserialized" it (turned the file into a set of objects) and updated the tickets in the secondary registry object associated with the other node. This is clean and it generates log messages describing the contents of each file as it arrives, which reassures you that the file contains the right data.

However, during the 99.9% of the time when the nodes are running and the network is OK, this approach approximately doubles the amount of overhead to run Cushy. Turning the file back into objects is almost as expensive as creating the objects in the first place. Worse, every time you get a new checkpoint file you have to discard all the old objects and replace them with new objects, which means the old objects have to be garbage collected and destroyed.

This was one place where simplicity over efficiency seemed to go too far. The alternative was to fetch the files across the network, but not to open or read them until some sort of failure routed a request for a ticket that belonged to the other node. Then during normal periods the files would be continuously updated on disk, but they would never be opened until one of the objects they contained was needed.

When a node fails, a bunch of requests for that node may be forwarded by the Front End to a backup node almost at the same time. The first request has to restore all the tickets, but while that is going on the other requests should wait until restore completes. I a real J2EE environment this sort of coordination is handled by the EJB layer, but CAS uses Spring and has no EJBs.

The obvious way to do this is with a Java "synchronized" operation, which acquires a lock while the tickets are being restored from disk to memory. Generally speaking this is not something you want to do. Generally the rule is that you should never hold any lock while doing any type of I/O. Since we know this can take as long as a second to complete, it is not the sort of thing you normally want to do locked. However, the only operations that are queuing up for the lock are requests for tickets owned by the secondary (failed) node, and the readObject that is going to restore all the tickets will end, successfully or with an I/O exception, and then those requests will be processed.

This optimization saves a tiny amount of CPU, but it is continuous across all the time the network is behaving normally. If you disable it, and there is a parameter to disable it on the ticketRegistry bean of the ticketRegistry.xml Spring configuration file, then each checkpoint file will be restored after a Notify is received (from the Notify request thread) and each incremental file will be restored after it is read by the Quartz thread that calls timerDriven, so requests never have to synchronize and wait. Of course, if the request proceeds after a file has been received but before it has been restored as new tickets, the request will be processed against the old set of tickets. That is the downside of impatience.

When using "Tickets on Request", there are two basic rules. First, you don't have complete control unless you are synchronized on the Secondary Registry object that corresponds to that node and set of files. Secondly, in order to work in both HTTPS and SharedDisk mode, the processing is coordinated by the modified date on the files. When a file is turned into object in memory, then the objects have the same "modified date" as the file that created or updated them. When the file modified date is later than the objects modified date, then the objects in memory are stale and the file should be restored at the next request.

Sanity check: In a real Shared Disk mode the timestamps on the files are set by the file system, either of the file server or the local disk during HTTP processing (when the /cas/cluster/getCheckpoint or /cas/cluster/getIncremental operation completes). In either case they are set by the same clock. The typical 10 second interval between events (and even a much smaller interval) is much larger than the clock resolution. The important thing here is that we are always comparing one file timestamp with another file timestamp from the same source. This part of the code never uses a timestamp from the local System, so we don't have to worry if clocks are out of sync across systems.

However, there are two potential sources of lastModifiedDate for a file. One is a value saved in memory the last time we looked at the file. The other is do go to the disk directory and get the current value. Even if the directory is fast, going there is still I/O, and you don't want to do I/O while running synchronized (holding a lock) and in other cases it does delay things a bit. When running in HTTP (not SharedDisk) mode the files don't get onto the disk unless they are read, and the end of reading the files is to update the lastModified date in memory. In SharedDisk mode the timerDriven routine (every 10 seconds or so) checks the current lastModified date from the directory. So the question is (read the code to find out the answer) when we do a getTicket in SharedDisk mode, do we stop and get the current lastModified value for both files (a lot of delay and overhead at a critical moment) or do we take the tickets we have and let the timerDriven routine decide when it is time to load a fresher set of tickets?

Generally an incremental file if it exists should always be later than a checkpoint. If both files are later than the objects in memory, always restore the checkpoint first.

Now for a chase condition that is currently declared to be unimportant. Assume that "Tickets on Request" is disabled, so tickets are being restored as soon as the file arrives. Assume that there are a large number of tickets so restoring the checkpoint (which is done in one thread as a result of the Notify request) takes longer than the number of seconds before the next incremental is generated. The incremental is small, and it is read by the timerDriven thread independent of the Notify request. So it is possible if these two restores are not synchronized against each other that this first incremental will be applied to the old objects in memory instead of the new objects still being restored from the checkpoint. Nothing really bad happens here. The New Tickets in the incremental are certainly newer than the old objects, and the Deleted Tickets in the incremental certainly deserve to be deleted, and if the first incremental is applied to the old set of tickets and doesn't update the objects created by the new checkpoint, then wait for the second incremental which is cumulative and will correct the problem. So the issue is not worth adding synchronization to avoid.

SharedDisk

The SharedDisk parameter is typically specified in the ticketRegistry.xml Spring configuration file. It turns off the Cushy HTTP processing. There will be no Notify message, and therefore no HTTP fetching of the checkpoint or incremental file. There is no exchange of dummy ServiceTicketId for communication security because there is no communication. It is used in real SharedDisk situations and in Unit Test cases.

Since there is no notify, the timerDriven code that generates checkpoint and incremental files has to check the last modified timestamp on the checkpoint file of any other node. If the timestamp changes, then that triggers the subset of Notify processing that does not involve HTTP or file transfers (like the resetting of flags indicating possible node health).

Cold Start Quiet Period

When CAS starts up and finds no previous checkpoint file in its work directory, there are no tickets to restore. This is a Cold Start, and it may be associated with a change of CAS code from one release to another with possible changes to the Ticket object definitions. A cold start has to happen at one time and it has to restart all the servers the cluster. You do not want one server running on old code while another server runs on the new code. To give the operators time to make the change, after a cold start CAS enters the Cold Start Quiet Period which lasts for 10 minutes (built into the source). During this period it does not send or respond to HTTP requests from other nodes. That way the nodes cannot exchange mismatched object files.

Healthy

When CAS receives an HTTP GET I/O error attempting to contact or read data from another node, it marks that node as "unhealthy" It then waits for a Notify from the node, and then tries to read the new checkpoint file.

Without the "healthy" flag, when a node goes down all the other nodes would attempt every 10 seconds or so to read a new incremental file but the HTTP connect would time out. Adding a timeout every 10 seconds seems like a waste, and the Notify process will tell us soon enough when it is time to reconsider the health of the node.

Note that Healthy deals with a failure of this server to connect to a node while TicketsOnRequest is triggered when the Front End cannot get to the node and sends us a request that belongs to the other node. If a node really goes down, both things happen at roughly the same time. Otherwise, it is possible for just one type of communication to fail while the other still works.

Usage Pattern

Users start logging into CAS at the start of the business day. The number of TGTs begins to grow.

Users seldom log out of CAS, so TGTs typically time out instead of being explicitly deleted.

Users abandon a TGT when they close the browser. They then get a new TGT and cookie when they open a new browser window.

Therefore, the number of TGTs can be much larger than the number of real CAS users. It is a count of browser windows and not of people or machines.

At Yale around 3 PM a typical set of statistics is:

Unexpired-TGTs: 13821
Unexpired-STs: 12
Expired TGTs: 30
Expired STs: 11

So you see that a Ticket Registry is overwhelmingly a place to keep logon TGTs (in this statistic TGTs and PGTs are combined).

Over night the TGTs from earlier in the day time out and the Registry Cleaner deletes them.

So generally the pattern is a slow growth of TGTs while people are using the network application, followed by a slow reduction of tickets while they are asleep, with a minimum probably reached each morning before 8 AM.

If you display CAS statistics periodically during the day you will see a regular pattern and a typical maximum number of tickets in use "late in the day".

Translated to Cushy, the cost of the full checkpoint and the size of the checkpoint file grow over time along with the number of active tickets, and then the file shrinks over night. During any period of intense login activity the incremental file may be unusually large. If you had a long time between checkpoints, then around the daily minimum (8 AM) you could get an incremental file bigger than the checkpoint.

CAS Ticket Objects Need to be Fixed

CAS has some bugs. They are very, very unlikely to occur, but they are there. Cushy can't fix them because they are in the Ticket classes themselves.

ConcurrentModificationException

First, the login TGT object has some collections. One collection gets a new entry every time a Service Ticket is created and it is used for Single Sign Off. In CAS 4, a new collection is used to handle multiple factors of authentication. If two requests arrive at the same time to generate two Service Tickets on the same TGT, then one ST is created and is queued up by existing TicketRegistry implementations to be replicated to other nodes. Meanwhile the second Service Ticket is being created and is adding a new entry to the Single Sign Off collection in the TGT.

CAS 3 was sloppy about this. CAS 4 adds "synchronized" statements to protect itself from everything except the ticket replication mechanism. Once the ST and TGT are queued up to be replicated that can happen at any time, and if it happens while the second Service Ticket is modifying the TGT then the third party off the shelf software replication system will throw a ConcurrentModificationException somewhere deep in the middle of its code. Will it recover properly?

Cushy cannot itself solve a problem in the Ticket classes, but it does allow you to safely add to the TicketGrantingTicketImpl class the method that fixes the problem:

private synchronized void writeObject(ObjectOutputStream s) throws IOException { s.defaultWriteObject();}

Private Copy of the Login TGT

JPA handles the entire collection of tickets properly.

The other replication systems use writeObject on what they think is a single ticket object. Unfortunately, Service Tickets and Proxy Granting Tickets point to the login TGT, and when you do a writeObject (serialize) them, Java generates a copy of the TGT which is generally sent over the network and is received at the other node as a pair of ticket objects.

You can verify that none of the TicketRegistry implementations fix this problem, because CAS has made all the important fields of the Ticket object private with no exposed methods that allow any code to fix it.

In CAS 3 it is not a problem because the copy of the TGT works just as well as the real TGT during CAS processing, and Service Tickets are used or time out so quickly it doesn't matter. In CAS 4 this may become a problem because the TGT can change in important ways after it is created and the copy of the TGT connected to a replicated Proxy Granting Ticket becomes stale and outdated.

Cushy avoids this problem because the periodic checkpoint file captures all the tickets with all their relationships. Limited examples of this problem can occur around node failure, but for all the other TicketRegistry solutions (except JPA) this happens all the time to all the tickets during normal processing.

JUnit Testing

Cushy includes a JUnit test that runs all the same cases that the DefaultTicketRegistry JUnit test runs.

It is not possible to configure enough of a Java Servlet Web server to test the HTTP Notify and file transfer. You have to test that on a real server. JUnit tests run in SharedDisk mode, where two objects representing the TicketRegistry objects on two different nodes in the cluster both write and read files from the same disk directory.

The trick here is to create two Primary CusyTicketRegistry instances with two compatible but opposite configurations. Typically one Primary object believes that it is node "casvm01" and that the cluster consists of a second node named "casvm02", while the other Primary object believes that it is node "casvm02" in a cluster with "casvm01".

There are two test classes with entirely different strategies.

CushyTicketRegistryTest.java tests the TicketRegistry interface and the Cushy functions of checkpoint, restore, writeIncremental, and readIncremental. You can create a single ticket or a 100,000 TGTs. This verifies that the tickets are handled correctly, but it does not test CAS Business Layer processing. This test case Intialization creates a new empty TicketRegistry for each test, so it is good for checking possible to test that a sequence of operations produces an expected outcome.

...

127.0.0.1   casvm01,casvm02

Without this the two CushyClusterConfiguration beans cannot be tricked into regarding the one machine as if it was two nodes.

...

Create credentials on casvm02
Create a TGT with the credentials on casvm02
Simulate a failure of casvm02, from now on everything is casvm01
Create a ST using the TGT ID of the casvm02 TGT.
Use the ST to create a PGT.
Create a new ST using the PGT just created.
Validate the ST. Make sure that the netid that comes back matches the credentials supplied to casvm02.

...