Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Four years ago Yale implemented a "High Availability" CAS cluster using JBoss Cache to replicate tickets. After that, the only CAS crashes were caused by failures of the ticket replication mechanism. We were disappointed that a mechanism nominally designed to improve availability considered replacing JBoss, but there is a more fundamental problem here. It should be a source of failure. We considered switching from JBoss Cache to an alternate library performing essentially the same service, but it was not clear that any other packages would solve all the problemsstructural feature of the system that problems in the ticket replication mechanism cannot crash CAS. Replacing one magic black box of code with another hoping the second one works better misses the point.

General object replication systems are necessary for shopping cart applications that handle thousands of concurrent users spread across a number of machines. That is more than CAS really needs. CAS has a relatively light load that could probably be handled by a single server, but it needs to be available all the time, even during disaster recovery when there may be unexpected network communication problems. It also turns out that CAS tickets violate some of the restrictions that general object replication systems place on application objects.We developed CushyTicketRegistry to be a new option specifically designed to support CAS in the modern network environment. It adds new features to make the single CAS server more available without configuring a cluster, but if you decide to add additional servers it provides a much simpler, but also more reliable approach to ticket replicationAt the same time, E-Commerce applications don't have to worry about running when the network is generally sick or there is no database in which to record the transactions, but CAS is a critical infrastructure component that has to be up if it is at all possible for it to be running.

CushyTicketRegistry cannot ever crash CAS. No matter what goes wrong with the network, it will keep periodically retrying until it can connect, but in the meanwhile CAS runs normally. This becomes possible because it is designed for the network infrastructure of today and not the one commonly deployed a decade ago. It solves just the CAS problem, so it is a single medium size Java source file that someone can read and understand instead of a complex black box of code designed to solve a much larger general problem.

Cushy is a cute name that roughly stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale". This summarizes what it is and how it works.

The TicketRegistry is the component of CAS that stores the ticket objectsobjects CAS uses to remember what it has done. There are at least 5 different versions of TicketRegistry that you can choose (Default, JPA, JBoss, Ehcache, Memcached) and Cushy simply adds one additional choice. While traditional applications are assembled at build time, CAS uses the Spring Framework to essentially create logical sockets for components that are plugged in at application startup time . This is driven by XML configuration files. The ticketRegistry.xml file configures whichever registry option you choose. For a simple single standalone CAS server, the standard choice is the DefaultTicketRegistry class which keeps the tickets in an in memory Java table keyed by the ticket id string.Cushy is a cute name that roughly stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale".

The Standalone Server

Suppose you simply change the class name from DefaultTicketRegistry to CushyTicketRegistry (and add a few required parameters described later). Cushy was based on the DefaultTicketRegistry code, so while CAS runs everything works exactly the same. The change occurs if you shut CAS down, particularly because you have to restart it. Since the DefaultTicketRegistry only has an in memory table, all the tickets are lost when the application restarts. Cushy detects the shutdown and saves all the ticket objects to a file on disk, using a single Java writeObject statement on the entire collection.  Unless that file is deleted while CAS is down, then when CAS restarts Cushy reloads all the tickets from that file into memory and restores all the CAS state from before the shutdown. Users do not have to login again, and no user even notices that CAS restarted unless they tried to access CAS while it was down.

...