Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JPA makes CAS dependent on a database. It doesn't really use the database for any queries or reports. You can use any database, but the database is a single point of failure. At Yale CAS is nearly the first thing that has to come up during disaster recovery, but if it uses JPA then you have to bring up the database (or have a special standalone CAS configuration for disaster recovery only). If you already have a 24x7x365 database managed by professionals who can guarantee availability, this is a good solution. If not, then this is an insurmountable prerequisite for bringing up an application like CAS.

Cushy can be configured to use a shared disk for its files. You might think that introduces the same single point of failure as a shared database. The difference is that when the database goes away JPA stops working and because it is integrated into the essential code of CAS, all the CAS servers stop running at the same time. If the Cushy shared disk goes away for a while then the periodic update of the checkpoint file stops, but the CAS node continues to operate normally. Cushy doesn't have a problem until the shared disk fails and then one or more CAS servers also fail. Besides, it is somewhat easier to find 24x7 NAS solutions than to try and configure a 24x7 database.

The various cache (in memory object replication) solutions should also work. Unfortunately, some have massively complex configuration parameters with multicast network addresses and timeout values to determine node failure.They also tend to be better at detecting a node that is dead than they are at dealing with nodes that are sick and accept a message but then never really get to processing it and responding. They operate entirely in memory, so at least one node has to remain up while the others reboot in order to maintain the content of the cache. While node failure is well defined, the status of objects is ambiguous if the network is divided into two segments by a linkage failure, the two segments operate independently for a while, and then connection is reestablished.

...