...
Cushy automatically solves this problem every time it takes a full checkpoint. The other nodes obtain a fresh exact copy of all the tickets on the other node connected together exactly as they are on the other node with the very latest information.
When Cushy generates an incremental file between full checkpoints, then all the added Tickets in the incremental file are individually serialized, producing the same result as the caching solutions. With Cushy, however, every 5 minutes the full checkpoint comes along and cleans it all up.
The reason why CAS can tolerate this sloppy serialization is that it doesn't affect the Business Logic. Suppose a ST is serialized on one node and is sent to another node where it is validate. Validation follows the chain from the ST to the TGT and then gets the Netid (and maybe the attributes). The result is the same whether you obtain the Netid from the "real" TGT or a copy of the real TGT made a few seconds ago. Once the ST is validated it is deleted, and that also discards all the other objects chained off the ST by the caching mechanism. It it isn't validate, then the ST times out and is deleted anyway.
If you have a PGT that points to a TGT, and if the PGT is serialized and copied to another node, and if after it is copied the TGT is changed (which cannot happen today but might be something CAS does in a future release with multifactor support), then the copy of the PGT points to the old copy of the TGT with the old info while the original PGT points to the original TGT with the new data. This problem would have to be solved before you introduce any new CAS features that meaningfully change the TGT.
Cushy solves this currently non-existent problem every time it does a full checkpoint. Between checkpoints, only for the tickets added since the last checkpoint, Cushy creates copies of TGTs from the individually serialized STs and PGTs just like the caching systems. It creates a lot fewer of them and they last only a few minutes.
Now for the real problem that CAS has not solved.
When you serialize a collection, Java must internally obtain an "iterator" and step one by one through the objects in the collection. An iterator knows how to find the next or previous object in the collection. However, the iterator can break if while it is dealing with one element in the collection another thread is adding a new element to the collection "between" the object that serialization is currently processing and the object that the iterator expects to be next. When this happens, serialization stops and throws an error exception.
So if you are going to use a serialization based replication mechanism (like Ehcache, JBoss Cache, or Memcached) then it is a really, really bad idea to have a non-threadsafe collection in your tickets, such as the services table in the TGT used for Single SignOut. Collisions don't happen all that often, but as it turns out a very common user behavior can make them much more likely.
Someone presses the "Open All In Tabs" button of the browser to create several tabs simultaneously. Two tabs reference CAS aware applications that redirect the browser to CAS. The user is already logged on, so each tab only needs a Service Ticket. The problem is that both Service Tickets point to the same TGT, and both go into the services table for Single SignOut, and the first one to get generated can start to be serialized while the second one is about to add its new entry in the services table.
Yale does not use Single SignOut, so we simply disabled the services table. If you want to solve this problem then at least Cushy gives you access to all the code, so you can come up with a solution if you understand Java threadingCushy has one opportunity to get into trouble. It occurs when a logged in user goes to the Portal and the portal obtains a PGT, CAS writes an incremental record with the PGT in it, and then the CAS server crashes hard without a clean shutdown. When CAS reboots the only copy of the PGT is in the last incremental file, and that means that the copy of that ticket in the registry that owns it was restored from a "single ticket serialization" with a private copy of the TGT. Fixing this would require changes to the TicketGrantingTicketImpl and AbstractTicket classes of cas-server-core because they provide no way to correct a bad Granting Ticket reference set by serialization (the field is private and there is no setter, so the only time the field can be set is when the object is created).
For Now
Current CAS simply ignores these issues and it doesn't seem to have any problems doing so.
Yale does not use Single Sign Out, so we do not need the "Services" table in the TGT. We disable updates to the table and without the table the CAS 3 TGT is thread safe enough to be reliable.
In the long run Cushy gives you control of clustering and therefore provides code you can use to fix these or any other problems. The other replication mechanisms are not customizable.
Usage Pattern
Users start logging into CAS at the start of the business day. The number of TGTs begins to grow.
...
Even without a cluster, Cushy still checkpoints the ticket cache to disk and restores the tickets across a reboot. So it provides a useful function in a single machine configuration that is otherwise only available with JPA and a database.
...
You Can Configure Manually
Although CushyClusterConfiguration makes most configuration problems simple and automatic, if it does the wrong thing and you don't want to change the code you can ignore it entirely. As will be shown in the next section, there are three properties, a string and two Properties tables) that are input to the CusyTicketRegistry bean. The whole purpose of CushyClusterConfiguration is to generate a value for these three parameters. If you don't like it, you can use Spring to generate static values for these parameters and you don't even have to use the clusterConfiguration bean.
...