...
Incrementals are trivial (.1 to .2 seconds).
CushyTicketRegistry (the code)
CushyTicketRegistry is a medium sized Java class that does all the work. It began with the standard JASIG DefaultTicketRegistry code that stores the tickets in memory (in a ConcurrentHashMap). Then on top of that base, it adds code to serialize tickets to disk and to transfer the disk files between nodes using HTTP.
Unlike the JASIG TicketRegistry implementations, CushyTicketRegistry does not create a single big cache of tickets lumped together from all the nodes. Each node is responsible for the tickets it creates. The TicketRegistry on each node is transferred over the network to the other nodes. Therefore, on each node there is an instance of CushyTicketRegistry for the locally created tickets and other instances of the class for tickets owned by the other nodes.
This is a custom solution designed for the specific CAS requirements. It is not a general object caching mechanism. It is really a strategy for the use of standard Java collections, serialization, and network I/O in a relatively small amount of code. Because the code is so small, it was convenient to put everything in a single class source file.
Configuration
In JASIG CAS, the administrator selects one of the several TicketRegistry optional implementations and configures it using a Spring Bean XML file located in WEB-INF/spring-configuration/ticketRegistry.xml. With CushyTicketRegistry this file creates the first "Primary" object instance that manages the Tickets created and owned by the local nodes. That object examines the configuration and creates additional "Secondary" object instances for every other node configured in the cluster.
The Cluster
Cluster configuration requirements became complex enough that they were moved into their own CushyClusterConfiguration class. This Bean is defined in front of the CushyTicketRegistry in the Spring ticketRegistry.xml file.
Why is this complicated? We prefer a single "cas.war" artifact that works everywhere. It has to work on standalone or clustered environments, in a desktop sandbox with or without virtual machines, but also in official DEV (development), TEST, and PROD (production) servers. Changing the WAR file for each environment is undesirable because we do not want to change the artifact between Test and Production. The original idea was to configure things at the container level (JBoss), but Yale Production Services did not want to be responsible for managing all that configuration stuff.
So CushyClusterConfiguration adds Java logic instead of just a static cluster configuration file. During initialization on the target machine it can determine all the IP addresses assigned to the machine and the machine's primary HOSTNAME. This now allows two strategies.
First, you can configure all your clusters (sandbox, dev, test, prod, ...). Then at runtime CushyClusterConfiguration determines the IP addresses of the current machine and scans each cluster definition provided. It cannot use a cluster that does not contain the current machine, so it stops and uses the first cluster than contains a URL that references an IP address on the current server.
...
Configuration
In JASIG CAS, the administrator selects one of the several TicketRegistry optional implementations and configures it using a Spring Bean XML file located in WEB-INF/spring-configuration/ticketRegistry.xml. With CushyTicketRegistry this file creates the first "Primary" object instance that manages the Tickets created and owned by the local nodes. That object examines the configuration and creates additional "Secondary" object instances for every other node configured in the cluster.
The Cluster
Cluster configuration requirements became complex enough that they were moved into their own CushyClusterConfiguration class. This Bean is defined in front of the CushyTicketRegistry in the Spring ticketRegistry.xml file.
Why is this complicated? We prefer a single "cas.war" artifact that works everywhere. It has to work on standalone or clustered environments, in a desktop sandbox with or without virtual machines, but also in official DEV (development), TEST, and PROD (production) servers. Changing the WAR file for each environment is undesirable because we do not want to change the artifact between Test and Production. The original idea was to configure things at the container level (JBoss), but Yale Production Services did not want to be responsible for managing all that configuration stuff.
So CushyClusterConfiguration adds Java logic instead of just a static cluster configuration file. During initialization on the target machine it can determine all the IP addresses assigned to the machine and the machine's primary HOSTNAME. This now allows two strategies.
First, you can configure all your clusters (sandbox, dev, test, prod, ...). Then at runtime CushyClusterConfiguration determines the IP addresses of the current machine and scans each cluster definition provided. It cannot use a cluster that does not contain the current machine, so it stops and uses the first cluster than contains a URL that references an IP address on the current server.
If none of the configured clusters contains the current machine, or if no configuration is provided, then Cushy uses the HOSTNAME and some Java code. The code was written for the Yale environment and can describe other environments, but if you already have a cluster with other machine naming conventions then you may want to modify or replace the Java at the end of this bean.
...
<bean id="ticketRegistry" class="edu.yale.cas.ticket.registry.CushyTicketRegistry"
p:serviceTicketIdGenerator-ref="serviceTicketUniqueIdGenerator"
p:checkpointInterval="300"
p:cacheDirectory= "#{systemProperties['jboss.server.data.dir']}/cas"
p:nodeName= "#{clusterConfiguration.getNodeName()}"
p:nodeNameToUrl= "#{clusterConfiguration.getNodeNameToUrl()}"
p:suffixToNodeName="#{clusterConfiguration.getSuffixToNodeName()}" />
The nodeName, nodeNameToUrl, and suffixToNodeName parameters link back to properties generated as a result of the logic in the CushyClusterConfiguration bean.
The cacheDirectory is a work directory on disk to which it has read/write privileges. The default is "/var/cache/cas" which is Unix syntax but can be created as a directory structure on Windows. In this example we use the Java system property for the JBoss /data subdirectory when running CAS on JBoss.
The checkpointInterval is the time in seconds between successive full checkpoints. Between checkpoints, incremental files will be generated.
CushyClusterConfiguration exposes a md5Suffix="yes" parameter which causes it to generate a ticketSuffix that is the MD5 hash of the computer host instead of using the nodename as a suffix. The F5 likes to refer to computers by their MD5 hash and using that as the ticket suffix simplifies the F5 configuration even though it makes the ticket longer.
How Often?
"Quartz" is the standard Java library for timer driven events. There are various ways to use Quartz, including annotations in modern containers, but JASIG CAS uses a Spring Bean interface to Quartz where parameters are specified in XML. All the standard JASIG TicketRegistry configurations have contained a Spring Bean configuration that drives the RegistryCleaner to run and delete expired tickets every so often. CushyTicketRegistry requires a second Quartz timer configured in the same file to call a method that replicates tickets. The interval configured in the Quartz part of the XML sets a base timer that determines the frequency of the incremental updates (typically every 5-15 seconds). A second parameter to the CushyTicketRegistry class sets a much longer period between full checkpoints of all the tickets in the registry (typically every 5-10 minutes).
A full checkpoint contains all the tickets. If the cache contains 20,000 tickets, it takes about a second to checkpoint, generates a 3.2 megabyte file, and then has to be copied across the network to the other nodes. An incremental file contains only the tickets that were added or deleted since the last full checkpoint. It typically takes a tenth of a second an uses very little disk space or network. However, after a number of incrementals it is a good idea to do a fresh checkpoint just to clean things up. You set the parameters to optimize your CAS environment, although either operation has so little overhead that it should not be a big deal.
Based on the usage pattern, at 8:00 AM the ticket registry is mostly empty and full checkpoints take no time. Late in the afternoon the registry reaches its maximum size and the difference between incrementals and full checkpoints is at its greatest.
Although CAS uses the term "incremental", the actual algorithm is a differential between the current cache and the last full checkpoint. So between full checkpoints, the incremental file size increases as it accumulates all the changes. Since this also includes a list of all the Service Ticket IDs that were deleted (just to be absolutely sure things are correct), if you made the period between full checkpoints unusually long it is possible for the incremental file to become larger than the checkpoint and since it is transferred so frequently this would be much, much worse to performance than setting the period for full checkpoints to be a reasonable number.
Nodes notify each other of a full checkpoint. Incrementals occur so frequently that it would be inefficient to send messages around. A node picks up the other incrementals from the other nodes each time it generates its own incremental.clusterConfiguration.getSuffixToNodeName()}" />
The nodeName, nodeNameToUrl, and suffixToNodeName parameters link back to properties generated as a result of the logic in the CushyClusterConfiguration bean.
The cacheDirectory is a work directory on disk to which it has read/write privileges. The default is "/var/cache/cas" which is Unix syntax but can be created as a directory structure on Windows. In this example we use the Java system property for the JBoss /data subdirectory when running CAS on JBoss.
The checkpointInterval is the time in seconds between successive full checkpoints. Between checkpoints, incremental files will be generated.
CushyClusterConfiguration exposes a md5Suffix="yes" parameter which causes it to generate a ticketSuffix that is the MD5 hash of the computer host instead of using the nodename as a suffix. The F5 likes to refer to computers by their MD5 hash and using that as the ticket suffix simplifies the F5 configuration even though it makes the ticket longer.
How Often?
"Quartz" is the standard Java library for timer driven events. There are various ways to use Quartz, including annotations in modern containers, but JASIG CAS uses a Spring Bean interface to Quartz where parameters are specified in XML. All the standard JASIG TicketRegistry configurations have contained a Spring Bean configuration that drives the RegistryCleaner to run and delete expired tickets every so often. CushyTicketRegistry requires a second Quartz timer configured in the same file to call a method that replicates tickets. The interval configured in the Quartz part of the XML sets a base timer that determines the frequency of the incremental updates (typically every 5-15 seconds). A second parameter to the CushyTicketRegistry class sets a much longer period between full checkpoints of all the tickets in the registry (typically every 5-10 minutes).
A full checkpoint contains all the tickets. If the cache contains 20,000 tickets, it takes about a second to checkpoint, generates a 3.2 megabyte file, and then has to be copied across the network to the other nodes. An incremental file contains only the tickets that were added or deleted since the last full checkpoint. It typically takes a tenth of a second an uses very little disk space or network. However, after a number of incrementals it is a good idea to do a fresh checkpoint just to clean things up. You set the parameters to optimize your CAS environment, although either operation has so little overhead that it should not be a big deal.
Based on the usage pattern, at 8:00 AM the ticket registry is mostly empty and full checkpoints take no time. Late in the afternoon the registry reaches its maximum size and the difference between incrementals and full checkpoints is at its greatest.
Although CAS uses the term "incremental", the actual algorithm is a differential between the current cache and the last full checkpoint. So between full checkpoints, the incremental file size increases as it accumulates all the changes. Since this also includes a list of all the Service Ticket IDs that were deleted (just to be absolutely sure things are correct), if you made the period between full checkpoints unusually long it is possible for the incremental file to become larger than the checkpoint and since it is transferred so frequently this would be much, much worse to performance than setting the period for full checkpoints to be a reasonable number.
Nodes notify each other of a full checkpoint. Incrementals occur so frequently that it would be inefficient to send messages around. A node picks up the other incrementals from the other nodes each time it generates its own incremental.
CushyTicketRegistry (the code)
CushyTicketRegistry is a medium sized Java class that does all the work. It began with the standard JASIG DefaultTicketRegistry code that stores the tickets in memory (in a ConcurrentHashMap). Then on top of that base, it adds code to serialize tickets to disk and to transfer the disk files between nodes using HTTP.
Unlike the JASIG TicketRegistry implementations, CushyTicketRegistry does not create a single big cache of tickets lumped together from all the nodes. Each node "owns" the tickets it creates
The Spring XML configuration creates what is called the Primary instance of the CushyTicketRegistry class. This object is the TicketRegistry as far as the rest of CAS is concerned and it implements the TicketRegistry interface. From the properties provided by Spring from the CushyClusterConfiguration, the Primary object determines the other nodes in the cluster and it creates an additional Secondary object instance of the CushyTicketRegistry class for each other node.
Tickets created by CAS on this node are stored in the Primary object which periodically checkpoints to disk, and more frequently writes the incremental changes file to disk. It then notifies the other nodes when it has a new checkpoint to pick up. The Secondary objects keep a Read-Only copy of the tickets on the other nodes in memory in case that node fails.
Methods and Fields
In addition to the ConcurrentHashMap named "cache" that CushyTicketRegistry borrowed from the JASIG DefaultTicketRegistry code to index all the tickets by their ID string, CushyTicketRegistry adds two collections:
...
These two collections are maintained by the implementations of the addTicket and deleteTicket methods of the TicketRegistry interface.
This class has three constructors.
- The constructor without arguments is used by Spring XML configuration of the class and generates the Primary object that holds the local tickets created by CAS on this node. There is limited initialization that can be done in the constructor, so most of the work is in the afterPropertiesSet() method called by
...
- Spring when it completes its XML configuration of the object.
- The constructor with nodename and url parameters is used by the Primary object to create Secondary objects for other nodes in the cluster configuration.
- The constructor with a bunch of arguments is used by Unit Tests.
The following significant methods are added to the CushyTicketRegistry class:
- checkpoint() - Called from the periodic quartz thread. Serializes all tickets in the Registry to the nodename file in the work directory on disk. Makes a point in time thread safe copy of references to all the current tickets in "cache" and clearsthe added and deleted ticket collections. Builds an ArrayList of the non-expired tickets. Serializes the ArrayList (and therefore all the non-expired tickets) to /var/cache/cas/CASVM1. Generates a Service Ticket ID that will act as a password until the next checkpoint call. Notifies the other nodes, in this example by calling the /cas/cache/notify service of CASVM2 passing the password ticketid.
- restore() - Empty the current cache and de-serialize the /var/cache/cas/nodename file to a list of tickets, then add all the unexpired tickets in the list to rebuild the cache. Typically this only happens once on the primary object at CAS startup where the previous checkpoint of the local cache is reloaded from disk to restore this node to the state it was in at last shutdown. However, secondary caches (of CASVM2 in this example) are loaded all the time in response to a /cas/cache/notify call from CASVM2 that it has taken a new checkpoint.
- writeIncremental() - Called by the quartz thread between checkpoints. Serializes point in time thread safe copies of the addedTickets and deletedTickets collections to create the nodename-incremental file in the work directory.
- readIncremental() - De-serialize two collections from the nodename-incremental file in the work directory. Apply one collection to add tickets to the current cache collection and then apply the second collection to delete tickets. After the update, the cache contains all the non-expired tickets from the other node at the point the incremental file was created.
- getRemoteCache readRemoteCache - Generate an https: request to read the nodename or nodename-incremental file from another node and store it in the work directory.
- notifyNodes() - calls the /cas/cluster/notify restful service on each other node after a call to checkpoint() generates a full backup. Passes the generated dummy ServiceTicketId to the node which acts as a password in any subsequent getRemoteCache() call.
- processNotify() - called from the Spring MVC layer when the message from a notifyNodes() call arrives at the other node.
- timerDriven() - called from Quartz every so often (say every 10 seconds) to generate incrementals and periodically a full checkpoint. It also reads the current incrmental from all the other nodes.
- destroy() - called by Java when CAS is shutting down. Writes a final checkpoint file that can be used after restart to reload all the tickets to their status at shutdown.
Unlike conventional JASIG Cache mechanisms, the CushyTicketRegistry does not combine tickets from all the nodes. It maintains shadow copies of the individual ticket caches from other nodes. If a node goes down, then the F5 starts routing requests for that node to the other nodes that are still up. The other nodes can recognize that these requests are "foreign" (for tickets issued by another node and therefore in the shadow copy of that node's tickets) and they can handle such requests temporarily until the other node is brought back up.
...