Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Each CAS server in the cluster has a shadow object representing the TicketRegistry of each of the other nodes. In normal operation, that object contains no ticket objects. There is no need to extract objects from the other node's files until a failure occurs and a request for one of those tickets arrives. Then Cushy restores the tickets from the file into memory (Just In Time) and processes requests on behalf of the failed node.

...

  • p:sharedDisk="true" - disables HTTP communication for JUnit Tests and when the work directory is on a shared disk.
  • p:disableJITDeserializationdisableTicketsOnRequest="true" - disables an optimization that only reads tickets from a checkpoint or incremental file the first time the tickets are actually needed. The only reason for using this parameter is during testing so that the number of tickets read from the file appears in the log immediately after the file is generated.
  • p:excludeSTFromFiles="true" - this is plausibly an option you should use. It prevents Service Tickets from being written to the checkpoint or incremental files. This makes incremental files smaller because it is then not necessary to keep the growing list of ST IDs for all the Service Tickets that were deleted probably before anyone ever really cared about them.
  • p:useThread="true" - use a thread to read the checkpoint file from another CAS node. If not set, the file is read in line and this may slow down the processing of a new checkpoint across all the nodes.

...

In a SharedDisk situation (see below) there is no HTTP and therefore no /cluster/notify call. Instead, the timerDriven routine checks the Last Modified date on the other node's checkpoint file. When it changes, it performs a subset of the full processNotify operations to reset flags and mark the other server healthy.

Just In Time Deserialization

In the first cut, each checkpoint and incremental file was turned into objects immediately after it was read. Then it became clear that Cushy was using lots of processing time to create objects that just got discarded a few minutes later without anyone actually using them, and that just used memory and left junk for Garbage Collection to clean up. So although it adds just a bit of extra complexity, Cushy now waits for an request for a ticket in the file before opening the file and extracting the objects.

If all the nodes are up and the network is operating normally, all requests will be routed to the node that created the ticket and already holds it in memory. During this time the files get transferred across the network, but they just sit on disk.

Cushy detects a failure and loads the objects into memory when a request arrives for a ticket it does not own. How does it know that the cluster has returned to normal? Ultimately, that is a decision made by the Front End. however, when Cushy gets a Notify from the other node (a /cas/cluster/notify?nodename=...) it knows that a new checkpoint file for that node has been created and has a pretty strong indication that the node thinks that it is OK.

So the arrival of a Notify clears the "objects have been serialized to memory" flag after the new checkpoint file has been fetched to disk. At that point there are objects in memory, but they are stale. There are newer better objects in the file, but they have not be extracted yet. If the network is back to normal they never will be extracted. The next time the node gets a request for a ticket owned by that node, then it is time of Just In Time Deserialization again and a new set of tickets will be loaded into memory.

If you are worried about memory use and Garbage Collection, it should not be a problem:

  • While the network is running normally, files are being transferred but there are no objects in memory and nothing to clean up.
  • While a node is down, there are objects loaded into memory but the dead node is not generating new files so these objects once loaded stay in memory and are not replaced.

...

Tickets on Request

The simplest and therefore the initial logic for Cushy read a checkpoint or incremental file from another node and immediately "deserialized" it (turned the file into a set of objects) and updated the tickets in the secondary registry object associated with the other node. This is clean and it generates log messages describing the contents of each file as it arrives, which reassures you that the file contains the right data.

However, during the 99.9% of the time when the nodes are running and the network is OK, this approach approximately doubles the amount of overhead to run Cushy. Turning the file back into objects is almost as expensive as creating the objects in the first place. Worse, every time you get a new checkpoint file you have to discard all the old objects and replace them with new objects, which means the old objects have to be garbage collected and destroyed.

This was one place where simplicity over efficiency seemed to go too far. The alternative was to fetch the files across the network, but not to open or read them until some sort of failure routed a request for a ticket that belonged to the other node. Then during normal periods the files would be continuously updated on disk, but they would never be opened until one of the objects they contained was needed.

When a node fails, a bunch of requests for that node may be forwarded by the Front End to a backup node almost at the same time. The first request has to restore all the tickets, but while that is going on the other requests should wait until restore completes. I a real J2EE environment this sort of coordination is handled by the EJB layer, but CAS uses Spring and has no EJBs.

The obvious way to do this is with a Java "synchronized" operation, which acquires a lock while the tickets are being restored from disk to memory. Generally speaking this is not something you want to do. Generally the rule is that you should never hold any lock while doing any type of I/O. Since we know this can take as long as a second to complete, it is not the sort of thing you normally want to do locked. However, the only operations that are queuing up for the lock are requests for tickets owned by the secondary (failed) node, and the readObject that is going to restore all the tickets will end, successfully or with an I/O exception, and then those requests will be processed.

This optimization saves a tiny amount of CPU, but it is continuous across all the time the network is behaving normally. If you disable it, and there is a parameter to disable it on the ticketRegistry bean of the ticketRegistry.xml Spring configuration file, then each checkpoint file will be restored after a Notify is received (from the Notify request thread) and each incremental file will be restored after it is read by the Quartz thread that calls timerDriven, so requests never have to synchronize and wait. Of course, if the request proceeds after a file has been received but before it has been restored as new tickets, the request will be processed against the old set of tickets. That is the downside of impatience.

When using "Tickets on Request", there are two basic rules. First, you don't have complete control unless you are synchronized on the Secondary Registry object that corresponds to that node and set of files. Secondly, in order to work in both HTTPS and SharedDisk mode, the processing is coordinated by the modified date on the files. When a file is turned into object in memory, then the objects have the same "modified date" as the file that created or updated them. When the file modified date is later than the objects modified date, then the objects in memory are stale and the file should be restored at the next request.

Generally an incremental file if it exists should always be later than a checkpoint. If both files are later than the objects in memory, always restore the checkpoint first.

Now for a chase condition that is currently declared to be unimportant. Assume that "Tickets on Request" is disabled, so tickets are being restored as soon as the file arrives. Assume that there are a large number of tickets so restoring the checkpoint (which is done in one thread as a result of the Notify request) takes longer than the number of seconds before the next incremental is generated. The incremental is small, and it is read by the timerDriven thread independent of the Notify request. So it is possible if these two restores are not synchronized against each other that this first incremental will be applied to the old objects in memory instead of the new objects still being restored from the checkpoint. Nothing really bad happens here. The New Tickets in the incremental are certainly newer than the old objects, and the Deleted Tickets in the incremental certainly deserve to be deleted, and if the first incremental is applied to the old set of tickets and doesn't update the objects created by the new checkpoint, then wait for the second incremental which is cumulative and will correct the problem. So the issue is not worth adding synchronization to avoid.

SharedDisk

The SharedDisk parameter is typically specified in the ticketRegistry.xml Spring configuration file. It turns off the Cushy HTTP processing. There will be no Notify message, and therefore no HTTP fetching of the checkpoint or incremental file. There is no exchange of dummy ServiceTicketId for communication security because there is no communication. It is used in real SharedDisk situations and in Unit Test cases.

...

Note that Healthy deals with a failure of this server to connect to a node while Just In Time Deserialization TicketsOnRequest is triggered when the Front End cannot get to the node and sends us a request that belongs to the other node. If a node really goes down, both things happen at roughly the same time. Otherwise, it is possible for just one type of communication to fail while the other still works.

...