Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Recover tickets after reboot without JPA, or a separate server, or a cluster (works on a standalone server)
  • Recover tickets after a crash, except for the last few seconds of activity that did not get to disk.
  • No dependency on any large external librarylibraries. Pure Java using only the standard Java SE runtime and some Apache commons stuff.
  • All source in one class. A Java programmer can read it and understand it.
  • Can also be used to cluster CAS servers
  • Cannot crash CAS ever, no matter what is wrong with the network or other servers.
  • A completely different and simpler approach to the TicketRegistry. Easier to work with and extend.
  • Probably uses more CPU and network I/O than other TicketRegistry solutions, but it has a constant predictable overhead you can verify is trivial.

...

Four years ago Yale implemented a "High Availability" CAS cluster using JBoss Cache to replicate tickets. After that, the only CAS crashes were caused by failures of JBoss Cache. Red Hat failed to diagnose or fix the problem. As we tried to diagnose the problem ourselves we discovered both bugs and design problems in the structure of Ticket objects and the use of the TicketRegistry solutions that contributed to the failure. We considered replacing JBoss Cache with Ehcache, but there is a more fundamental problem here. It should not be possible for any failure of the data replication mechanism to crash all of the CAS servers at once. Another choice of cache might be more reliable, but it would suffer from the same fundamental structural problemwhile that might improve reliability somewhat it would not solve the fundamental structural problems.

Having been burned by software so complicated that the configuration files were almost impossible to understand, Cushy was developed to accomplish the same thing in a way so simple it could not possibly fail.

The existing CAS TicketRegistry solutions must be configured to replicate tickets to the other nodes and to wait for this activity to complete, so that any node can validate a Service Ticket that was just generated a few milliseconds ago. Waiting for the replication to complete is what makes CAS vulnerable , but it seemed like an obvious solution because Ehcache and JBoss Cache promised that it would work. Once it didn't work, it was obvious that someone would at least ask the question whether there was another way to do this. If there was, then the entire TicketRegistry strategy could be reconsideredto a crash if the replication begins but never completes. Synchronous ticket replication is a standard service provided by JBoss Cache and Ehcache, but is it the right way to solve the Service Ticket validation problem? A few minutes spent crunching the math suggested there was a better way.

It is easier and more efficient to send the request to the node that already has the ticket and can process it rather than struggling to get the ticket to every other node in advance of the next request.

It turns out that most of the modern network Front End devices that distribute requests to members of a cluster are smart enough today to program a relatively simple amount of CAS protocol and locate the ticketid for a request. If you activate the CAS feature that puts a node identifier on the end of the ticketid, then requests can be routed to the node that created the ticket. If you cannot get your network administrator to program your Front End device, then CushyFrontEndFilter does the same thing not as efficiently as it could be done in the network box, but more efficiently than the processing of requests using "naked" Ehcache without the Filter.

With programming in the Front End or the Filter, CAS nodes can "own" the tickets they create. Those tickets have to be replicated to other nodes so there is recovery when a node crashes, but that replication can be lazy and happen over seconds instead of milliseconds, and no request has to wait for the replication to complete.

That is a better way to run Ehcache or any of the traditional TicketRegistry solutions, but it opens up the possibility of doing something entirely different. Of periodically replicating the TicketRegistry instead of just the individual tickets.

 

"Cushy" stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale". This summarizes what it is and how it works.

For objects to be replicated from one node to another, programs use the Java writeObject statement to "Serialize" the object to a stream of bytes that can be transmitted over the network and then restored in the receiving JVM. Ehcache and the other ticket replication systems operate on individual tickets. However, writeObject can operate just as well on the entire contents of the TicketRegistry. This is very simple to code, it is guaranteed to work, but it might not be efficient enough to use. Still, once you have the idea the code starts to write itself.

Start with the DefaultTicketRegistry source that CAS uses to hold tickets in memory on a single CAS standalone server. Then add the writeObject statement (surrounded by the code to open and close the file) to create a checkpoint copy of all the tickets, and a corresponding readObject and surrounding code to restore the tickets to memory. The first thought was to do the writeObject to a network socket, because that was what all the other TicketRegistry implementations were doing. Then it became clear that it was simpler, and more generally useful, and a safer design, if the data was first written to a local disk file. The disk file could then optionally be transmitted over the network in a completely independent operation. Going first to disk created code that was useful for both standalone and clustered CAS servers, and it guaranteed that the network operations were completely separated from the Ticket objects and therefore the basic CAS function.

The first benchmarks turned out to be even better than had been expected, and that justified further work on the system.

CushyTicketRegistry and the Standalone Server

For a single CAS server, the standard choice is the DefaultTicketRegistry class which keeps the tickets in an in-memory Java table keyed by the ticket id string. Suppose you change the name of the Java class in the Spring ticketRegistry.xml file from DefaultTicketRegistry to CushyTicketRegistry (and add a few required parameters described later). Cushy was based on the DefaultTicketRegistry source code, so everything works the same as it did before, until you have to restart CAS for any reason. Since the DefaultTicketRegistry only has an in memory table, all the ticket objects are lost when CAS restarts and users all have to login again. Cushy detects the shutdown and using a single Java writeObject statement it saves all the ticket objects in the Registry to a file on disk (called the "checkpoint" file). When CAS restarts, Cushy reloads all the tickets from that file into memory and restores all the CAS state from before the shutdown. No user even notices that CAS restarted unless they tried to access CAS during the restart.

The number of tickets CAS holds grows during the day and shrinks over night. At Yale there are fewer than 20,000 ticket objects in CAS memory, and Cushy can write all those tickets to disk in less than a second generating a file around 3 megabytes in size. Other numbers of tickets scale proportionately (you can run a JUnit test and generate your own numbers). This is such a small amount of overhead that Cushy can be proactive.

So to take the next logical step, start with the previous ticketRegistry.xml configuration and duplicate the XML elements that currently call a function in the RegistryCleaner every few minutes. In the new copy of the XML elements, call the "timerDriven" function in the (Cushy)ticketRegistry bean every few minutes. Now Cushy will not wait for shutdown but will back up the ticket objects regularly just in case the CAS machine crashes without shutting down normally. When CAS restarts after a crash, it can load a fairly current copy of the ticket objects which will satisfy the 99.9% of the users who did not login in the last minutes before the crash.

The next step should be obvious. Can we turn "last few minutes" into "last few seconds". You could create a full checkpoint of all the tickets every few seconds, but now the overhead becomes significant. So go back to ticketRegistry.xml and set the parameters to call the "timerDriven" function every 10 seconds, but set the "checkpointInterval" parameter on the CushyTicketRegistry object to only create a new checkpoint file every 300 seconds. Now Cushy creates the checkpoint file, and then the next 29 times it is called by the timer it generates an "incremental" file containing only the changes since the checkpoint was written. Incremental files are cumulative, so there is only one file, not 29 separate files. If CAS crashes and restarts, Cushy reads the last checkpoint, then applies the changes in the last incremental, and now it has all the tickets up to the last 10 seconds before the crash. That satisfies 99.99% of the users and it is probably a good place to quit.

What about disaster recovery? The checkpoint and incremental files are ordinary sequential binary files on disk. When Cushy writes a new file it creates a temporary name, fills the file with new data, closes it, and then swaps the new for the old file, so other programs authorized to access the directory can safely open or copy the files while CAS is running. Feel free to write a shell script or Pearl or Python program to use SFTP or any other program or protocol to back up the data offsite or to the cloud.

Some people use JPATicketRegistry and store a copy of the tickets in a database to accomplish the same single server restart capability that Cushy provides. If you are happy with that solution, stick with it. Cushy doesn't require the database, it doesn't require JPA, and it may be easier to work with.

Before you configure a cluster, remember that today a server is typically a virtual machine In the current TicketRegistry implementations, any request in a cluster to create a Service Ticket must replicate the service ticket to at least one other computer (the database server in JPA, one or more nodes using Ehcache or any other ticket replication mechanism) before the Service Ticket ID is returned to the browser. This ensures that the Service Ticket can be validated by any node to which the application's validation request is directed. After validation, there is a second network transaction to delete the ticket. So every ST involves two backend synchronous operations.

However, it has always been part of CAS that every ticketid has a suffix that, at least on paper, can contain the node name of the CAS server that created the ticket. Using this feature in practice requires some node configuration methodology. Once this is done, then any validate request (for example, any call to /cas/serviceValidate contains in the query string part of the URL a ticket= parameter, and the end of the value of that parameter designates the node that created the ticket. Today you can program most modern network front end devices to extract this information from the request and route the validate request to the node that created the ticket and is guaranteed to have it in memory. If you cannot program your front end device, or if you cannot convince your network administrators to do the work for you, then CushyFrontEndFilter accomplishes the same thing by scanning requests as they arrive at a CAS server and forwarding requests like validation to the server that created the ticket. If you have two servers and requests are randomly assigned to them, then 50% of the time the request goes to the right server and there is no network transaction, and 50% of the time the request has to be forwarded by the Filter to the other server, which then validates the ST and deletes it returning the response message. So with the Filter you expect, on average, one network transaction half the time instead of, with current JPA or Cache technology, two network transactions every time. When the number of nodes in the cluster is more than 2, the Filter works even better.

CushyFrontEndFilter works with Ehcache or CushyTicketRegistry. When added to Ehcache you can change the cache configuration so that the Service Ticket cache does not use synchronous replication, or even better you can turn off replication entirely for the Service Ticket cache because every 10 seconds a Service Ticket is either used and discarded or else times out, so it makes no sense to replicate them at all if the front end or filter routes requests properly.

However, once you come up with the idea of using front end routing to avoid the synchronous ticket replication (which was the source of crashes in JBoss Cache at Yale), then some new more radical changes to TicketRegistry become possible. In addition to the various validate request, you can route the /proxy request to the node that owns the Proxy Granting Ticket, and you can route new Service Ticket requests to the node that issued the Ticket Granting Ticket (based on the suffix of the CASTGC cookie). Now a basic principle of all the existing ticket registry designs is no longer necessary. CAS Ticket objects do not have to be stored in what appears to be a common shared pool. Tickets can be segregated into separate collections based on the identity of the node that created and "owns" the ticket.

"Cushy" stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale". This summarizes what it is and how it works.

For objects to be replicated from one node to another, libraries use the Java writeObject statement to "Serialize" the object to a stream of bytes that can be transmitted over the network and then restored in the receiving JVM. Ehcache and JBoss Cache use writeObject on individual tickets (although it turns out they also end up serializing copies of all the other objects the ticket points to, including the TGT when attempting to replicate a ST). However, writeObject can operate just as well on the entire TicketRegistry. Making a "checkpoint" copy of the entire collection of tickets to disk (at shutdown for example) and then restoring this collection (after a restart) is very simple to code. Since Java does all the work, it is guaranteed to behave correctly. It is a useful additional function. However, you can be more aggressive in the use of this approach, and that suggests the design of an entirely different type of TicketRegistry.

Start with the DefaultTicketRegistry source that CAS uses to hold tickets in memory on a single CAS standalone server. Then add the writeObject statement (surrounded by the code to open and close the file) to create a checkpoint copy of all the tickets, and a corresponding readObject and surrounding code to restore the tickets to memory. The first thought was to do the writeObject to a network socket, because that was what all the other TicketRegistry implementations were doing. Then it became clear that it was simpler, and more generally useful, and a safer design, if the data was first written to a local disk file. The disk file could then optionally be transmitted over the network in a completely independent operation. Going first to disk created code that was useful for both standalone and clustered CAS servers, and it guaranteed that the network operations were completely separated from the Ticket objects and therefore the basic CAS function.

The first benchmarks turned out to be even better than had been expected, and that justified further work on the system.

CushyTicketRegistry and the Standalone Server

For a single CAS server, the standard choice is the DefaultTicketRegistry class which keeps the tickets in an in-memory Java table keyed by the ticket id string. Suppose you change the name of the Java class in the Spring ticketRegistry.xml file from DefaultTicketRegistry to CushyTicketRegistry (and add a few required parameters described later). Cushy was based on the DefaultTicketRegistry source code, so everything works the same as it did before, until you have to restart CAS for any reason. Since the DefaultTicketRegistry only has an in memory table, all the ticket objects are lost when CAS restarts and users all have to login again. Cushy detects the shutdown and using a single Java writeObject statement it saves all the ticket objects in the Registry to a file on disk (called the "checkpoint" file). When CAS restarts, Cushy reloads all the tickets from that file into memory and restores all the CAS state from before the shutdown. No user even notices that CAS restarted unless they tried to access CAS during the restart.

The number of tickets CAS holds grows during the day and shrinks over night. At Yale there are fewer than 20,000 ticket objects in CAS memory, and Cushy can write all those tickets to disk in less than a second generating a file around 3 megabytes in size. Other numbers of tickets scale proportionately (you can run a JUnit test and generate your own numbers). This is such a small amount of overhead that Cushy can be proactive.

CAS is a very important application, but on modern hardware it is awfully small and cheap to run. Since it was first developed there have been at least 5 generations of new chip technology that now run what was never a big application to begin with.

So to take the next logical step, start with the previous ticketRegistry.xml configuration and duplicate the XML elements that currently call a function in the RegistryCleaner every few minutes. In the new copy of the XML elements, call the "timerDriven" function in the (Cushy)ticketRegistry bean every few minutes. Now Cushy will not wait for shutdown but will back up the ticket objects regularly just in case the CAS machine crashes without shutting down normally. When CAS restarts after a crash, it can load a fairly current copy of the ticket objects which will satisfy the 99.9% of the users who did not login in the last minutes before the crash.

The next step should be obvious. Can we turn "last few minutes" into "last few seconds". You could create a full checkpoint of all the tickets every few seconds, but now the overhead becomes significant. So go back to ticketRegistry.xml and set the parameters to call the "timerDriven" function every 10 seconds, but set the "checkpointInterval" parameter on the CushyTicketRegistry object to only create a new checkpoint file every 300 seconds. Now Cushy creates the checkpoint file, and then the next 29 times it is called by the timer it generates an "incremental" file containing only the changes since the checkpoint was written. Incremental files are cumulative, so there is only one file, not 29 separate files. If CAS crashes and restarts, Cushy reads the last checkpoint, then applies the changes in the last incremental, and now it has all the tickets up to the last 10 seconds before the crash. That satisfies 99.99% of the users and it is probably a good place to quit.

What about disaster recovery? The checkpoint and incremental files are ordinary sequential binary files on disk. When Cushy writes a new file it creates a temporary name, fills the file with new data, closes it, and then swaps the new for the old file, so other programs authorized to access the directory can safely open or copy the files while CAS is running. Feel free to write a shell script or Pearl or Python program to use SFTP or any other program or protocol to back up the data offsite or to the cloud.

Some people use JPATicketRegistry and store a copy of the tickets in a database to accomplish the same single server restart capability that Cushy provides. If you are happy with that solution, stick with it. Cushy doesn't require the database, it doesn't require JPA, and it may be easier to work with.

Before you configure a cluster, remember that today a server is typically a virtual machine that is not bound to any particular physical hardware. Ten years ago moving a service to a backup machine involved manual work that took time. Today there is VM infrastructure and automated monitoring and control tools. A failed server can be migrated and restarted automatically or with a few commands. If you can get the CAS server restarted fast enough that almost nobody notices, then you have solved the problem that clustering was originally designed to solve without adding a second running node.

...

If you program this into the Front End, then the request goes directly to the right server without any additional overhead. With only the Filter, a request goes to some randomly chosen CAS Server which may have to forward the request to another server, forward back the response, and handle failure if the preferred server goes downgoes down.

There is a separate page to describe Front End programming for CAS.

CushyTicketRegistry and a CAS Cluster

...

Notify is only done every few minutes when there is a new checkpoint. Incrementals are generated all the time, but they are not announced. Each server is free to poll the other servers periodically to fetch the most recent incremental with the /cas/cluster/getIncremental request (add the dummyServiceTicketId to prove you are authorized to read the data).

CAS is a high security application, but it always has been. The best way to avoid introducing a security problem is to model the design of each new feature on something CAS already does, and then just do it the same way.

Since these node to node communication calls are modeled on existing CAS Service Ticket validation and Proxy Callback requests, they are configured into CAS in the same place (in the Spring MVC configuration, details provided below).Note: Yes, this sort of thing can be done with GSSAPI, but after looking into configuring Certificates or adding Kerberos, it made sense to keep it simple and stick with the solutions that CAS was already using to solve the same sort of problems in other contextsin the Spring MVC configuration, details provided below).

Are You Prepared?

Everything that can go wrong will go wrong. We plan for hardware and software failure, network failure, and disaster recovery. To do this we need to know how things will fail and how they will recover from each type of problem.

...

Replicating the entire TicketRegistry instead of just replicating individual tickets is less efficient. The amount of overhead is predictable and you can verify that the extra overhead is trivial. However, remember this is simply the original Cushy 1.0 design which was written to prove a point and is aggressively "in your face" pushing the idea of "simplicity over efficiency". After we nail down all the loose ends, it is possible to add a bit of extra optimization to get arbitrarily close to Ehcache in terms of efficiency.

Basic Principles

...

it is possible to add a bit of extra optimization to get arbitrarily close to Ehcache in terms of efficiency.

Ticket Chains (and Test Cases)

...

Proxy Granting Tickets, however, can remain around for hours. So the one long term consequence of a failure is that the login TGT can be on one server, but a PGT can be on a different server that created it while the login server was temporarily unavailable. This requires some thought, but you should quickly realize that everything will work correctly today. In future CAS releases there will be an issue if a user adds additional credentials (factors of authentication) to an existing login after a PGT is created. Without the failure, the PGT sees the new credentials immediately. With current Cushy logic, the PGT on the backup server is bound to a point in time snapshot of the original TGT and will not see the additional credentials. Remember, this only occurs after a CAS failure. It only affects the users who got the Proxy ticket during the failure. It can be "corrected" if the end user logs out and then logs back into the middleware server.Cushy 2.0 will consider addressing this problem automaticallybe on one server, but a PGT can be on a different server that created it while the login server was temporarily unavailable. The PGT ends up with its own private copy of the TGT which is frozen in time at the moment the PGT was created. Remember, this is normal behavior for all existing TicketRegistry solutions and none of the other TicketRegistry options will ever "fix" this situation. At least Cushy is aware of the problem and with a few fixes to the Ticket classes Cushy 2.0 might be able to do better.

There is also an issue with Single Sign Out. If a user logs out during a failure of his login server, then a backup server processes the Single Log Out normally. Then when the login server is restored to operation, the Login TGT is restored from the checkpoint file into memory. Of course, no browser now has a Cookie pointing to that ticket, so it sits unused all day and then in the evening it times out and a second Single Sign Out process is triggered and all the applications that previously were told the user logged out are not contacted a second time with the same logout information. It is almost unimaginable that any application would be written so badly it would care about this, but it should be mentioned.

While the login server is down, new Service Tickets can be issued, but they cannot be meaningfully added to the "services" table in the TGT of the machine that is down.When that machine comes back up it resumes controlling the old TGT of the logged in user, and when the user logs off the Single Sign Out processing will occur only for servers that that machine knows about, and will omit services to which the user connected while the server that owned the TGT was down. Cushy provides a "best effort" Single Sign Out experience, and Cushy 1.0 cannot do better than this.

Cushy CAS Cluster

In this document a CAS "cluster" is just a bunch of CAS server instances that are configured to know about each other. The term "cluster" does not imply that the Web servers are clustered in the sense that they share Session objects (JBoss "clustering"). Nor does it depend on any other type of communication between machines. In fact, a Cushy CAS cluster could be created from a CAS running under Tomcat on Windows and one running under JBoss on Linux.

To the outside world, the cluster typically shares a common virtual URL simulated by the Front End device. At Yale, CAS is "https://secure.its.yale.edu/cas" to all the users and applications. The "secure.its.yale.edu" DNS name is associated with an IP address managed by the BIG-IP F5 device. It holds the certificate, terminates the SSL, then examines requests and based on programming called iRules it forwards requests to any of the configured CAS virtual machines.

Each virtual machine has a native DNS name and URL. It is these "native" URLs that define the cluster because each CAS VM has to use the native URL to talk to another CAS VM. At Yale those URLs follow a pattern of "https://vm-foodevapp-01.web.yale.internal:8443/cas". 

Internally, Cushy configuration takes a list of URLs and generates a cluster definition with three pieces of data for each cluster member: a nodename like "vmfoodevapp01" (the first element of the DNS name with dashes removed), the URL, and the ticket suffix that identifies that node (the F5 prefers the ticket suffix to the an MD5 hash of the IP address of the VMThere are a few types of network failure that work differently from node failure.

If one CAS node is unable to connect to another CAS node for a while, even though the other node is up, then it marks the other node as being "unhealthy" and waits patiently for the other node to send a /cluster/notify. The other node will send a Notify every time it generates a new Checkpoint, and when one of those Notify messages gets through then the two nodes will reestablish communication.

If the Front End is unable to get to a CAS Node, but the other server can get to it, then what happens next depends on whether the CushyFrontEndFilter is also installed. Having both the programmed Front End and also the Filter is a bit like suspenders and a belt, but if the Front End is doing its job then the Filter has nothing to do. However, in this particular case the Filter will see a request for a ticket owned by another node and will attempt to forward it to the node indicated in the request. If it succeeds then CAS has automatically routed traffic around the point of failure. However, remember that if the node actually goes down then there will be two connect timeout delays, one where the Front End determines the node is down and then a second where the Filter verifies that it is down.

Without the Filter then the current node receives a request for a ticket it does not own, loads tickets into its Secondary Registry for that node, and processes the request. What is different is that if the node is really up and the two nodes can connect, then this CAS node will continue to receive Notify requests and new checkpoint and incremental files from the other node even as it is also processing requests for that node sent to it by the Front End. Cushy is designed to handle this situation (because even in a normal failure the other node can come up just as you are in the middle of handling a request for it).

Configuration

In CAS the TicketRegisty is configured using the WEB-INF/spring-configuration/ticketRegistry.xml file.

...