Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Recover tickets after reboot without JPA, or a separate server, or a cluster (works on a standalone server)
  • Recover tickets after a crash, except for the last few seconds of activity that did not get to disk.
  • No dependency on any large external librarylibraries. Pure Java using only the standard Java SE runtime and some Apache commons stuff.
  • All source in one class. A Java programmer can read it and understand it.
  • Can also be used to cluster CAS servers
  • Cannot crash CAS ever, no matter what is wrong with the network or other servers.
  • A completely different and simpler approach to the TicketRegistry. Easier to work with and extend.
  • Probably uses more CPU and network I/O than other TicketRegistry solutions, but it has a constant predictable overhead you can verify is trivial.

...

Four years ago Yale implemented a "High Availability" CAS cluster using JBoss Cache to replicate tickets. After that, the only CAS crashes were caused by failures of JBoss Cache. Red Hat failed to diagnose or fix the problem. As we tried to diagnose the problem ourselves we discovered both bugs and design problems in the structure of Ticket objects and the use of the TicketRegistry solutions that contributed to the failure. We considered replacing JBoss Cache with Ehcache, but there is a more fundamental problem here. It should not be possible for any failure of the data replication mechanism to crash all of the CAS servers at once. Another choice of cache might be more reliable, but it would suffer from the same fundamental structural problemwhile that might improve reliability somewhat it would not solve the fundamental structural problems.

Having been burned by software so complicated that the configuration files were almost impossible to understand, Cushy was developed to accomplish the same thing in a way so simple it could not possibly fail.

The existing CAS TicketRegistry solutions must be configured to replicate tickets to the other nodes and to wait for this activity to complete, so that any node can validate a Service Ticket that was just generated a few milliseconds ago. Waiting for the replication to complete is what makes CAS vulnerable , but it seemed like an obvious solution because Ehcache and JBoss Cache promised that it would work. Once it didn't work, it was obvious that someone would at least ask the question whether there was another way to do this. If there was, then the entire TicketRegistry strategy could be reconsideredto a crash if the replication begins but never completes. Synchronous ticket replication is a standard service provided by JBoss Cache and Ehcache, but is it the right way to solve the Service Ticket validation problem. A few minutes spent crunching the math suggested there was a better way.

It is easier and more efficient to send the request to the node that already has the ticket and can process it rather than struggling to get the ticket to every other node in advance of the next request.

...