...
In 2010 Yale upgraded to CAS 3.4.2 and implemented "High Availability" CAS clustering using the JBoss Cache option (because Yale Production Services had standardized on JBoss for "clustering"). Unfortunately, the mechanism designed to improve CAS relilability reliability ended up as the cause of most CAS failures. If JBoss Cache (or any other clustering option) fails due to some unspecified network layer problem, requests back up in memory and eventually CAS stops running on all members of the cluster. None of the other available CAS clustering options have been reported to work flawlesslyyou insist that Service Tickets be replicated through the cluster, so that any CAS node can validate any Service Ticket, then replication has to complete before the ST can be passed back to the user. But if CAS has to wait for cache activity, then network problems or some sickness on one of the CAS nodes propagates back to all the nodes and CAS stops working. We considered changing to another option, but none of the alternatives has a spotless reputation for reliability.
There is much to be said for "off the shelf " software solutions, even when they are designed COTS software". After all, if something is widely used and written to handle much more complicated problems. However, there is also something to be said for much, much simpler solutions that just solve the one problem you need to solve, then it should handle CAS. Unfortunately, all these packages are designed to support application level software, and at Yale CAS is a Tier 0 system component (in Disaster Recovery planning) and it has to be back up first with as few dependencies as possible. Application software is not written to system specifications.
So CushyTicketRegistry was written to hold CAS tickets in memory and to replicate them to other CAS servers so they can take over if one server fails. It turns out that it is trivial (both in code and overhead) to snapshot the entire collection of tickets to a disk file using the Java writeObject operation. The resulting fairly small file can then be transferred between CAS servers using an HTTPS GET, because CAS runs on a Web Server so you might just as well use it. This approach may not be as efficient as the more sophisticated technology, but it is so dead flat simple that you can understand it, customize it, and arrange that it can never cause problems. More importantly, if it uses less than 5% of one core on a modern multicore commodity server, do you really need to be more efficient?
Every cluster of any type requires a network front end to route requests, detect failure, and maybe load balance. Cushy assumes that this front end is programmable, as most modern front ends are, and depends on routing rules that are entirely reasonable with today's devices.
Executive Summary
This is a quick introduction for those in a hurry.
CAS is a Single SignOn solution. Internally, it creates a set of objects called Tickets. There is a ticket for every logged on user, and short term Service Tickets that exist while a user is being authenticated to an application. The Business Layer of CAS creates tickets by, for example, validating your userid and password in a back end system like Active Directory. The tickets are stored in a plug in component called a Ticket Registry.
For a single CAS server, the Ticket Registry is just a in memory table of tickets (a Java "Map" object) keyed by the ticket ID string. When more than one CAS server is combined to form a cluster, then an administrator chooses one of several optional Ticket Registry solutions that allow the CAS servers to share the tickets.
One clustering option is to use JPA, the standard Java service to map objects to tables in a relational database. All the CAS servers share a database, which means that any CAS node can fail but the database has to stay up all the time or CAS stops working. Other solutions use generic object "caching" solutions (Ehcache, JBoss Cache, Memcached) where CAS puts the tickets into what appears to be a common container of Java objects and, under the covers, the cache technology ensures that the tickets are copied to all the other nodes.
JPA makes CAS dependent on a database. It doesn't really use the database for any real SQL stuff, so you could you almost any database system. However, the database is a single point of failure, so you need it to be reliable. If you already have a 24x7x365 database managed by professionals who can guarantee availability, this is a good solution. If not, then this is an insurmountable prerequisite for bringing up an application like CAS that doesn't really need database.
The various cache solutions should solve the problem. Unfortunately, they too have massively complex configuration parameters with multicast network addresses and timeouts, and while they are designed to work across complete node failure, experience suggests that they are not designed to work when a CAS machine is "sick". That is, if the machine is down and does not respond to any network requests the technology recovers, but if the node is up and receives messages but just doesn't process them correctly then queues start to clog up, they back up into CAS itself and then CAS stops working simultaneously on all nodes. There is also a problem with the "one big bag of objects" model if a router fails that connects two machine rooms, two CAS nodes are separated, and now there are separate versions of what the system is designed to believe is a single cohesive collection.
If you understand the problem CAS is solving and the way the tickets fit together, then each type of failure presents specific problems. Cushy is designed to avoid the big problems and provide transparent service to 99.9% of the CAS users. If one or two people experience an error message due to a CAS crash, and CAS crashes only once a year, then that is good enough especially when the alternative technologies can cause the entire system to stop working for everyone.
Cushy is a cute word that roughly stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale".
The name explains what it does.Java has a built in operation called writeObject that writes a binary version of Java objects to disk. If you use it on a complex object, like a list of all the tickets in the Registry, then it creates a disk file with all the tickets in the list. Later on you use readObject to turn the disk file back into a copy of the original list. Java calls this mechanism "Serialization". Using just one statement and letting Java do all the work and handle all the complexity makes this easy.
The other mechanisms (JPA or the cache technologies) operate on single tickets. They write individual tickets to the database or replicate them across the network. Obviously this is vastly more efficient than periodically copying all the tickets to disk. Except that at Yale, the entire Registry of tickets can be written to a disk file in 1 second and it produces a file about 3 megabytes in size. Those numbers are so trivial that writing a copy of the entire Registry to disk once every 5 minutes, or even once a minute, is trivial on a modern server. Given the price of hardware, being more efficient than that is unnecessary.
Once you have a file on disk it should not take very long to figure out how to get a copy of that file from one Web Server to another. An HTTP GET is the obvious solution, though if you had shared disk there are other solutions.
Going to an intermediate disk file was not the solution that first comes to mind. If the tickets are in memory on one machine and they have to be copied to memory on another machine, some sort of direct network transfer is going to be the first thing you think about. However, the intermediate disk file is useful to restore tickets to memory if you have to restart your CAS server for some reason. Mostly, it means that the network transmission is COMPLETELY separate from the process of creating, validating, and deleting tickets. If the network breaks down you cannot transfer the files, but CAS continues to operate normally and it can even generate new files with newer copies of all the tickets. When the network comes back the file transfer resumes independent of the main CAS services. So replication problems can never interfere with CAS operation.
Cushy is based on four basic design principles:
- CAS is very important, but it is small and cheap to run.
- Emphasize simplicity over efficiency as long as the cost to run remains trivial.
- Assume the network front end is programmable.
- Trying for perfection is the source of most total system failures. Allow one or two users to get a temporary error message when a CAS server fails.
How it works
Cushy is simple enough it can be explained to anyone, but if you are in a rush you can stop here.
Back in the 1960's a "checkpoint" was a copy of the important information from a program written on disk so if the computer crashed the program could start back at almost the point it left off. If a CAS server saves its tickets to a disk file, reboots, and then reads the tickets from the file back into memory it is back to the same state it had before rebooting. If you transfer the file to another computer and bring CAS up on that machine, it have moved the CAS server from one machine to another. Java writeObject and readObject guarantee the state and data are completely saved and restored.
JPA and the cache technologies try to maintain the image of a single big common bucket of shared tickets. This is a very simple view, but it is very hard to maintain and rather fragile. Cushy maintains a separate TicketRegistry for each CAS server, but replicates a copy of each TicketRegistry to all the other servers in the cluster.
Given the small cost of making a complete checkpoint, you could configure Cushy to generate one every 10 seconds and run the cluster on full checkpoints. It is probably inefficient, but using 1 second of one core and transmitting 3 megabytes of data to each node every 10 seconds is not a big deal on modern equipment. This was the first Cushy code milestone and it lasted for about a day before it was extended with a little extra code.
The next milestone (a day later) was to add an "incremental" file that contains all the tickets added or ticket ids of tickets deleted since the last full checkpoint. Creating multiple increments and transmitting only the changes the other node has not yet seen was considered, but it would require more code and complexity. If you generate checkpoints every few minutes, then the incremental file grows as more changes are made but it never gets really large. It is well know that the overhead of creating and opening a file or establishing a network connection is so great that the difference between reading or writing 5K or 100K is trivial.
In Cushy you configure a timer in XML. If you set the timer to 10 seconds, then Cushy writes a new incremental file every 10 seconds. Separately you configure the time between full checkpoints. When the timer goes off, if enough time has passed since the last checkpoint then instead of writing an incremental file, this time it writes a new full Checkpoint.
Only a small number of tickets are added, but lots of Service Tickets have been created and deleted and there is no good way to keep the list of expired Service Tickets from making the incremental file larger. So if you tried to separate full checkpoints by an unreasonable amount of time you would find the incremental file had grown to be larger than the checkpoint file and you have made things worse rather than better. So the expectation is you do a full checkpoint somewhere between every 1-10 minutes and you do an incremental somewhere between every 5 -15 seconds, but test it and make your own decisions.
A Service Ticket is created and then is immediately validated and deleted. Trying to replicate Service Tickets to the other nodes before the validation request comes in is an enormous problem that screws up the configuration and timing parameters for all the other Ticket Registry solutions. Cushy doesn't try to do replication at this speed. Instead, it has CAS configuration elements that ensure that each Ticket ID contains an identifier of the node that created it, and it depends on a front end smart enough to route any of the ticket validation requests to the node that created the ticket and already has it in memory. Then replication only is needed for crash recover.
Note: If the front end is not fully programmable it is a small programming exercise to be considered in Cushy 2.0 to forward the validation request from any CAS node to the node that owns the ticket and then pass back the results of the validation to the app.
Ticket Names
As with everything else, CAS has a Spring bean configuration file (uniqueIdGenerators.xml) to configure how ticket ids are generated. If you accept the defaults, then tickets have the following format:
type - num - random - nodename
where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end of the ticket is identified as a nodename.
In vanilla CAS the nodename typically comes from the cas.properties file, but Cushy requires every node in the cluster to have a unique name and even when you are using real clustering many CAS locations leave the "nodename" suffix on the ticket id to its default value of "-CAS". Cushy adds a smarter configuration bean described below and enforces the rule that the end of the ticket really identifies the node that created it and therefore owns it.
How it Fails (Nicely)
The Primary + Warm Spare Cluster
One common cluster model is to have a single master CAS server that normally handles all the requests, and a normally idle backup server (a "warm spare") that does nothing until the master goes down. Then the backup server handles requests while the master is down.
During normal processing the master server is generating tickets, creating checkpoints and increments, and sends them to the backup server. The backup server is generating empty checkpoints with no tickets because it has not yet received a request.
Then the master is shut down or crashes. The backup server has a copy in memory of all the tickets generated by the master, except for the last few seconds before the crash. It can handle new logins and it can issue Service Tickets against logins previously processed by the master, using its copy of the master's registry.
Now the master comes back up and, for this example, let us assume that it resumes its role as master (there are configurations where the backup becomes the new master and so when the old master comes back it becomes the new backup. This is actually easier for Cushy).
The master restores from disk a copy of its old registry and over the network it fetches a copy of the registry from the backup. It now has access to all the login or proxy tickets created by the backup while it was down, and it can issue Service Tickets based on those logins.
However, the failure has left some minor issues that are not important enough to be problems. Because each server is the owner of its own tickets and registry, each has Read-Only access to the tickets of the other server. (Strictly speaking that is not true. You can temporarily change tickets in your copy of the other node's registry, but when the other node comes back up and generates its first checkpoint, whatever changes you made will be replaced by a copy of the old unmodified ticket). So the master is unaware of CAS logouts that occurred while it was down and although it can process a logout for a user that logged into the backup while it was down, it really has no way to actually delete the login ticket. Since no browser has the TGT ID in a cookie any more, nobody will actually be able to use the zombie TGT, but the ticket is going to sit around in memory until it times out.
There are a few more consequences to Single SignOut that will be explained in the next section.
A Smart Front End
A programmable front end is configured to send Validate requests to the CAS server that generated the Service Ticket, /proxy requests to the CAS server that generated the PGT, other requests of logged on users to the CAS server they logged into, and login requests based on standard load balancing or similar configurations. Each ticket has a suffix that indicates which CAS server node generated it.
- If the URL "path" is a validate request (/cas/validate, /cas/serviceValidate, etc.) then route to the node indicated by the suffix on the value of the ticket= parameter.
- If the URL is a /proxy request, route to the node indicated by the suffix of the pgt= parameter.
- If the request has a CASTGC cookie, then route to the node indicated by the suffix of the TGT that is the cookie's value.
- Otherwise, or if the node selected by 1-3 is down, choose a CAS node using whatever round robin or master-backup algorithm previously configured.
So normally all requests go to the machine that created and therefore owns the ticket, no matter what type of ticket it is. When a CAS server fails, requests for its tickets are assigned to one of the other servers. Most of the time the CAS server recognizes this as a ticket from another node and looks in the current shadow copy of that node's ticket registry.
As in the previous example, a node may not have a copy of tickets issued in the last few seconds, so one or two users may see an error.
If someone logged into the failed node needs a Service Ticket, the request is routed to any backup node which creates a Service Ticket (in its own Ticket Registry with its own node suffix which it will own) chained to the copy of the original Login Ticket in the appropriate shadow Ticket Registry. When that ticket is validated, the front end routes the request based on the suffix to this node which returns the Netid from the Login Ticket in the shadow registry.
Again, the rule that each node owns its own registry and all the tickets it created and the other nodes can't successfully change those tickets has certain consequences.
- If you use Single SignOff, then the Login Ticket maintains a table of Services to which you have logged in so that when you logout or when your Login Ticket times out in the middle of the night then each Service gets a call from CAS on a published URL with the Service Ticket ID you used to login so the application can log you off if it has not already done so. In failover mode a backup server can issue Service Tickets for a failed nodes TGT, but it cannot successfully update the Service table in the TGT, because when the failed node comes back up it will restore the old Service table along with the old TGT.
- If the user logs out and the Services are notified by the backup CAS server, and then the node that owned the TGT is restored along with the now undead copy of the obsolete TGT, then in the middle of the night that restored TGT will timeout and the Services will all be notified of the logoff a second time. It seems unlikely that anyone would ever write a service logout so badly that a second logoff would be a problem. Mostly it will be ignored.
You have probably guessed by now that Yale does not use Single SignOut, and if we ever enabled it we would only indicate that it is supported on a "best effort" basis.
CAS Cluster
In this document a CAS "cluster" is just a bunch of CAS server instances that are configured to know about each other. The term "cluster" does not imply that the Web servers are clustered in the sense that they share Session information. Nor does it depend on any other type of communication between machines. In fact, a CAS cluster could be created from a CAS running under Tomcat and one running under JBoss.
To the outside world, the cluster typically shares a common virtual URL simulated by the Front End device. At Yale, CAS is "https://secure.its.yale.edu/cas" to all the users and applications. The "secure.its.yale.edu" DNS name is associated with an IP address managed by the BIG-IP F5 device. It terminates the SSL, then examines requests and based on programming called iRules it forwards requests to any of the configured CAS virtual machines.
Each virtual machine has a native DNS name and URL. It is these "native" URLs that define the cluster because each CAS VM has to use the native URL to talk to another CAS VM. At Yale those URLs follow a pattern of "https://vm-foodevapp-01.web.yale.internal:8080/cas".
Internally, Cushy configuration takes a list of URLs and generates a cluster definition with three pieces of data for each cluster member: a nodename like "vmfoodevapp01" (the first element of the DNS name with dashes removed), the URL, and the ticket suffix that identifies that node (at Yale the F5 likes the ticket suffix to be an MD5 hash of the DNS name).
Sticky Browser Sessions
An F5 can be configured to have "sticky" connections between a client and a server. The first time the browser connects to a service name it is assigned any available backend server. For the next few minutes, however, subsequently requests to the same service go back to whichever server the F5 assigned to handle the first request.
Intelligent routing is based on tickets that exist only after you have logged in. CAS was designed (for better or worse) to use Spring Webflow which keeps information in the Session object during the login process. For Webflow to work, one of two things must happen:
- The browser has to POST the Userid/Password form back to the CAS server that sent it the form (which means the front end has to use sticky sessions based on IP address or JSESSIONID value).
- You have to use real Web Server clustering so the Web Servers all exchange Session objects based on JSESSIONID.
Option 2 is a fairly complex process of container configuration, unless you have already solved this problem and routinely generate JBoss cluster VMs using some canned script. Sticky sessions in the front end are somewhat easier to configure and obviously they are less complicated than routing request by parsing the ticket ID string.
Yale made a minor change to the CAS Webflow to store extra data in hidden fields of the login form, and an additonal check so if the Form POSTs back to another server the other server can handle the rest of the login without requiring Session data.
What is a Ticket Registry
This is a rather detailed description of one CAS component, but it does not assume any prior knowledge.
CAS provides a Single SignOn function. It acts as a system component, but internally it is structured like most other Web applications. Internally it creates, validates, and deletes objects called Tickets. The Ticket Registry is the component that holds the tickets while CAS is running.
When the user logs in, CAS creates a ticket that the user can use to create other tickets (a Ticket Granting Ticket or TGT, although a more friendly name for it is the "Login Ticket"). Then when someone previously logged in uses CAS to authenticate to another Web application, CAS creates a Service Ticket (ST).
Web applications are traditionally defined in three layers. The User Interface generates the Web pages, displays data, and processes user input. The Business Logic validates requests, verifies inventory, approves the credit card, and so on. The backend "persistence" layer talks to a database. CAS doesn't sell anything, but it has roughly the same three layers.
The CAS User Interface uses Spring MVC and Spring Web Flow to log a user on and to process requests from other Web applications. The Business Logic validates the userid and password (typically against an Active Directory), and it creates and deletes the tickets. CAS tickets, however, typically remain in memory and do not need to be written to a database or disk file. Nevertheless, the Ticket Registry is positioned logically where the database interface would be in any other application program, and sometimes CAS actually uses a database.
CAS was written to use the Spring Java Framework to configure its options. CAS requires some object that implements the TicketRegistry function. JASIG CAS provides at least five alternative Ticket Registries. You pick one and then insert its name (and configure its parameters) using a documented Spring XML file which not surprisingly is named "ticketRegistry.xml". Given this modular plug-in design, Cushy is just one more option you can optionally configure with this file.
When you have a regular Web application that sells things, the objects in the application (products, inventory, orders) would be stored in a database and the most modern way to do this is with JPA. To support the JASIG JPA Ticket Registry, all the Java source for tickets and things that tickets contain or point to are annotated with references to database tables and the names and data types of the columns in the table that each data field maps to. If you don't use the JPA Ticket Registry these annotations are ignored. JPA uses the annotations to generate and then weave into these objects invisible support code to detect when something has changed and track connections from one object to the next.
The "cache" versions (Ehcache, JBoss Cache, Memcached) of JASIG TicketRegistry modules have no annotations and few expectaions. They use ordinary objects (sometimes call Plain Old Java Objects or POJOs). They require the objects to be serializable because, like Cushy, they use the Java writeObject statement to turn any object to a stream of bytes that can be held in memory, stored on disk, or sent over the network.
CAS tickets are all serializable, but they are not designed to be very nice about it. This is the "dirty secret" of CAS. It has always expected tickets to be serialized, but it breaks some of the rules and, as a result, can generate failures. They don't happen often, but CAS runs 24x7 and anything that can go wrong will go wrong. With one of the caching solutions, when it goes wrong it is deep inside a huge black box of "off the shelf" code that may or may not recover from the error.
The purpose of this section is to describe in more detail than you find in other CAS documentation just what is going on here, how Cushy avoids problems, and how Cushy would recover even if something went wrong.
In simple terms, the Login ticket (the TGT) "contains" your Netid (username, principal, whatever you call it). In more detail the TGT points to an Authentication object that points to a Principal object that contains the Netid. Currently when a user logs on the TGT, Netid, and any attributes are all determined once and that part of the TGT never changes. In the future, CAS may add higher levels of authentication (secondary "factors") and that might change the important part of the TGT, but that is not a problem now.
However, if you use Single SignOut then CAS also maintains a "services" table in the TGT associates old used ServiceTicket ID strings and a reference to a Service object that contains the URL that CAS should call to notify a service that a user previously authenticated by CAS has logged out. The services table changes through the day as users log in to applications.
CAS also generates Service Tickets. However, the ST is used and discarded in a few milliseconds during normal use, or if it is never claimed it times out after a default period of 10 seconds. When the ST is validated by the application, CAS returns the Netid, but CAS does not store the Netid in the ST. Instead, it points the ST to the TGT and the TGT "contains" the Netid. When the application validates the ST, CAS goes from the ST to the TGT, gets the Netid, deletes the ST, and returns the Netid to the application.
So the ST is around for such a short period of time that you would not think it has an important affect on the structure of the Ticket Registry. There are, however, two impacts:
- First, whenever you ask Java writeObject to serialize an object to bytes, Java not only turns that object into bytes but it also makes a copy of any other object it points to. Cushy, Ehcache, JBoss Cache, and Memcached all serialize objects, but only here will you find anyone explaining what that means. When you think you are serializing an ST what you are really getting is an ST, the TGT it points to, the Authentication and Principal objects the TGT points to, and then the Service objects for all the services that the TGT is remembering for Single SignOut. In reality, the only thing the ST needs is the Netid, but because CAS is designed with many layers of abstraction you get this entire mess whether you like it or not.
- If you do not assume that the Front End is smart enough to route validation requests to the right host, then there is a chase condition between the cache based ticket replication systems copying the ST to the other nodes and the possibility that the front end will route the ST validation request to one of those other nodes. The only way to make sure this will never happen is to configure the cache replication systems to copy the ST to all the other nodes before returning to the CAS Business Layer to confirm the ST is stored. However, if network I/O is synchronous, then if it fails then CAS stops running as a result.
A special kind of Service is allowed to "Proxy", to act on behalf of the user. Such a service gets its own Proxy Granting Ticket (PGT) which acts like a TGT in the sense that it generates Service Tickets and the ST points back to it. However, a PGT does not "contain" the Netid. Rather the PGT points to the TGT which does contain the Netid.
When Cushy does a full checkpoint of all the tickets, it doesn't matter how the tickets are chained together. Under the covers of the writeObject statement, Java does all the work of following the chains and understanding the structure, then it write out a blob of bytes that will recreate the exact same structure when you read it back in.
The caching solutions never serialize the entire Registry. They write single tickets one at a time, except that as we have seen, a single ST or PGT points to a TGT that points to a lot of junk and all that gets written out every time you think you are serializing a "single ticket".
When Cushy generates an incremental file between full checkpoints, then all the added Tickets in the incremental file are individually serialized, producing the same result as the caching solutions. With Cushy, however, every 5 minutes the full checkpoint comes along and cleans it all up.
The reason why CAS can tolerate this sloppy serialization is that it doesn't affect the Business Logic. Suppose a ST is serialized on one node and is sent to another node where it is validate. Validation follows the chain from the ST to the TGT and then gets the Netid (and maybe the attributes). The result is the same whether you obtain the Netid from the "real" TGT or a copy of the real TGT made a few seconds ago. Once the ST is validated it is deleted, and that also discards all the other objects chained off the ST by the caching mechanism. It it isn't validate, then the ST times out and is deleted anyway.
If you have a PGT that points to a TGT, and if the PGT is serialized and copied to another node, and if after it is copied the TGT is changed (which cannot happen today but might be something CAS does in a future release with multifactor support), then the copy of the PGT points to the old copy of the TGT with the old info while the original PGT points to the original TGT with the new data. This problem would have to be solved before you introduce any new CAS features that meaningfully change the TGT.
Cushy solves this currently non-existent problem every time it does a full checkpoint. Between checkpoints, only for the tickets added since the last checkpoint, Cushy creates copies of TGTs from the individually serialized STs and PGTs just like the caching systems. It creates a lot fewer of them and they last only a few minutes.
Now for the real problem that CAS has not solved.
When you serialize a collection, Java must internally obtain an "iterator" and step one by one through the objects in the collection. An iterator knows how to find the next or previous object in the collection. However, the iterator can break if while it is dealing with one element in the collection another thread is adding a new element to the collection "between" the object that serialization is currently processing and the object that the iterator expects to be next. When this happens, serialization stops and throws an error exception.
So if you are going to use a serialization based replication mechanism (like Ehcache, JBoss Cache, or Memcached) then it is a really, really bad idea to have a non-threadsafe collection in your tickets, such as the services table in the TGT used for Single SignOut. Collisions don't happen all that often, but as it turns out a very common user behavior can make them much more likely.
Someone presses the "Open All In Tabs" button of the browser to create several tabs simultaneously. Two tabs reference CAS aware applications that redirect the browser to CAS. The user is already logged on, so each tab only needs a Service Ticket. The problem is that both Service Tickets point to the same TGT, and both go into the services table for Single SignOut, and the first one to get generated can start to be serialized while the second one is about to add its new entry in the services table.
Yale does not use Single SignOut, so we simply disabled the services table. If you want to solve this problem then at least Cushy gives you access to all the code, so you can come up with a solution if you understand Java threading.
Usage Pattern
Users start logging into CAS at the start of the business day. The number of TGTs begins to grow.
Users seldom log out of CAS, so TGTs typically time out instead of being explicitly deleted.
Users abandon a TGT when they close the browser. They then get a new TGT and cookie when they open a new browser window.
Therefore, the number of TGTs can be much larger than the number of real CAS users. It is a count of browser windows and not of people or machines.
At Yale around 3 PM a typical set of statistics is:
Unexpired-TGTs: 13821
Unexpired-STs: 12
Expired TGTs: 30
Expired STs: 11
So you see that a Ticket Registry is overwhelmingly a place to keep TGTs (in this statistic TGTs and PGTs are combined).
Over night the TGTs from earlier in the day time out and the Registry Cleaner deletes them.
So generally the pattern is a slow growth of TGTs while people are using the network application, followed by a slow reduction of tickets while they are asleep, with a minimum probably reached each morning before 8 AM.
If you display CAS statistics periodically during the day you will see a regular pattern and a typical maximum number of tickets in use "late in the day".
Translated to Cushy, the cost of the full checkpoint and the size of the checkpoint file grow over time along with the number of active tickets, and then the file shrinks over night. During any period of intense login activity the incremental file may be unusually large. If you had a long time between checkpoints, then around the daily minimum (8 AM) you could get an incremental file bigger than the checkpoint.
Some Metrics
At Yale there are typically more than 10,000 and fewer than 20,000 Login tickets. Because Service Tickets expire when validated and after a short timeout, there are only several dozen unexpired Service Tickets at any given time.
Java can serialize a collection of 20,000 Login tickets to disk in less than a second (one core of a Sandy Bridge processor).Cushy has to block normal CAS processing just long enough to get a list of references to all the tickets, and the all the rest of the work occurs under a separate thread unrelated to any CAS operation that does not interfere with CAS processing.
Of course, Cushy also has to deserialize tickets from the other nodes. However, remember that if you are currently using any other Ticket Registry the number of tickets reported out in the statistics page is the total number combined across all nodes, while Cushy serializes only the tickets that the current node owns and it deserializes the tickets for the other nodes. So generally you can apply the 20K tickets = 1 second rule of thumb. Serializing 200,000 tickets has been measured to take 9 seconds (so it scales as expected) and if you convert the 20K common pool of tickets to Cushy, then each node will serialize 10K of tickets it owns and deserialize 10K of tickets from the other node (load balanced) or else in a master-backup configuration the master will serialize 20K tickets and deserialize 0, while the backup will serialize 0 and deserialize 20K. You come to the same number no matter how you slice it.
Incrementals are trivial (.1 to .2 seconds).
CushyTicketRegistry (the code)
CushyTicketRegistry is a medium sized Java class that does all the work. It began with the standard JASIG DefaultTicketRegistry code that stores the tickets in memory (in a ConcurrentHashMap). Then on top of that base, it adds code to serialize tickets to disk and to transfer the disk files between nodes using HTTP.
Unlike the JASIG TicketRegistry implementations, CushyTicketRegistry does not create a single big cache of tickets lumped together from all the nodes. Each node is responsible for the tickets it creates. The TicketRegistry on each node is transferred over the network to the other nodes. Therefore, on each node there is an instance of CushyTicketRegistry for the locally created tickets and other instances of the class for tickets owned by the other nodes.
This is a custom solution designed for the specific CAS requirements. It is not a general object caching mechanism. It is really a strategy for the use of standard Java collections, serialization, and network I/O in a relatively small amount of code. Because the code is so small, it was convenient to put everything in a single class source file.
Configuration
In JASIG CAS, the administrator selects one of the several TicketRegistry optional implementations and configures it using a Spring Bean XML file located in WEB-INF/spring-configuration/ticketRegistry.xml. With CushyTicketRegistry this file creates the first "Primary" object instance that manages the Tickets created and owned by the local nodes. That object examines the configuration and creates additional "Secondary" object instances for every other node configured in the cluster.
The Cluster
Cluster configuration requirements became complex enough that they were moved into their own YaleClusterConfiguration class. This Bean is defined in front of the CushyTicketRegistry in the Spring ticketRegistry.xml file.
Why is this complicated? We prefer a single "cas.war" artifact that works everywhere. It has to work on standalone or clustered environments, in a desktop sandbox with or without virtual machines, but also in official DEV (development), TEST, and PROD (production) servers. Changing the WAR file for each environment is undesirable because we do not want to change the artifact between Test and Production. The original idea was to configure things at the container level (JBoss), but Yale Production Services did not want to be responsible for managing all that configuration stuff.
So YaleClusterConfiguration adds Java logic instead of just a static cluster configuration file. During initialization on the target machine it can determine all the IP addresses assigned to the machine and the machine's primary HOSTNAME. This now allows two strategies.
First, you can configure all your clusters (sandbox, dev, test, prod, ...). Then at runtime CushyClusterConfiguration determines the IP addresses of the current machine and scans each cluster definition provided. It cannot use a cluster that does not contain the current machine, so it stops and uses the first cluster than contains a URL that references an IP address on the current server.
If none of the configured clusters contains the current machine, or if no configuration is provided, then Cushy uses the HOSTNAME and some Java code. The code was written for the Yale environment and can describe other environments, but if you already have a cluster with other machine naming conventions then you may want to modify or replace the Java at the end of this bean.
At Yale, the DEV, TEST, and PROD machines are all part of a two machine cluster where the HOSTNAME contains a "-01" or "-02" suffix. So by finding the current HOSTNAME it can say that if this machine has "-01" in its name, the other machine in the cluster is "-02" and the reverse.
Sounds easy, but as always the actual code implies some rules you need to know.
First, you can define the YaleClusterConfiguration bean with or without a "clusterDefinition" property. If you provide the property, it is a List of Lists of Strings:
<bean id="clusterConfiguration" class="edu.yale.its.tp.cas.util.YaleClusterConfiguration"
p:md5Suffix="yes" >
<property name="clusterDefinition">
<list>
<!-- Desktop Sandbox cluster -->
<list>
<value>http://foo.yu.yale.edu:8080/cas/</value>
<value>http://bar.yu.yale.edu:8080/cas/</value>
</list>
<!-- Development cluster -->
<list>
<value>https://casdev1.yale.edu:8443/cas/</value>
<value>https://casdev2.yale.edu:8443/cas/</value>
</list>
</list>
</property>
</bean>
In spring, the <value> tag generates a String, so this is what Java calls a List<List<String>> (List of Lists of Strings). As noted, the top List has two elements. The first element is a List with two Strings for the machines foo and bar. The second element is another List with two strings for casdev1 and casdev2.
There is no good way to determine all the DNS names that point to my server. However, it is relatively easy in Java to find all the IP addresses of all the LAN interfaces on the current machine. This list may be longer than you think. Each LAN adapter can have IPv4 and IPv6 addresses, and then there can be multiple real LANs and a bunch of virtual LAN adapters for VMWare or Virtualbox VMs you host or tunnels to VPN connections. Of course, there is always the loopback address.
Cushy is going to run through the list of cluster definitions. For each host name in each URL in the cluster, it will do a DNS lookup (or lookup the name in the /etc/hosts or /windows/system32/etc/hosts file). DNS can have more than one IP address for a server name. Cushy then checks each IP address returned for each server name in the list of URLs for a cluster against the list of IP addresses assigned to interfaces on the currently running machine machine. If none of the URLs in the cluster definition match any IP address on this machine, then this machine is not a member of that cluster and we can ignore that cluster and go on to the next cluster definition in the list. Cushy stops and accepts a cluster definition when one IP address for the DNS name matches one IP address assigned to any interface on the current running machine.
Restrictions:
You can certainly run more than one instance of CAS on different Web servers on different port numbers of the same machine. However, there is no J2EE API that returns all the port numbers used by the current Web server. So Cushy is blind to port numbers and has to make its decision based on IP addresses alone. Therefore, if you want to run more than one CAS server on a computer, you have to use virtual machines with one CAS server per VM, or you have to replace CusyClusterConfiguration with custom code.
Some operating systems (Windows for example) has Dynamic DNS support. If you do not uncheck the box in the Advanced section of the LAN adapter configuration, then all the IP addresses assigned to that LAN will be registered in the DNS server. If you have a real LAN that connects to the outside world, and a virtual LAN that connects to VMs you host, and you run a private 192.168.1.* network between the VMs, then unless you remembered to manually uncheck the box after creating the virtual LAN you are probably registering your 192.168.1.1 virtual IP address in the public DNS server. This is not efficient, but it could be a problem for Cushy if one of the servers in one of the cluster configurations has registered its own version of the same address. The current configuration code stops when it gets a match in any address, and 192.168.1.* addresses are very common in sandbox clusters running in VMs on the developer's desktop. So as you write the cluster configuration in XML, do a nslookup command for each hostname and verify that the addresses registered with DNS are the addresses you expect, or at least there are no obviously bogus addresses that are going to cause troublesolve the CAS Ticket problem and pretty much nothing else. It does not require a database, or any additional complex network configuration with multicast addresses and timeouts. It depends on the observed behavior that CAS is actually a fairly small component with limited hardware demands so that a slightly less "efficient" but rock solid and dead simple approach can be used to solve the problem.
Rather than trying to move tickets in memory to network message queues and multicast protocols, Cushy periodically uses the standard Java writeObject statement to write a copy of the entire ticket cache to disk. At Yale we have fewer than 20,000 tickets at any time, and this operation takes less than a second of elapsed time (on one core) and uses about 3.2MB of disk. So you can certainly do it once every few minutes. In between full checkpoints, every few seconds we write a file of changes since the last full backup.
Now you have to replicate the files to the other CAS nodes. Well, we are on a Web Server after all. We need to secure the data, but we are running HTTPS to secure the regular user data. Cushy is a small amount of source, and if you feel like changing it to use shared disk or any other transport mechanism you can easily change it, but it is easier to understand and control network and node failure if you assume the data is replicated with HTTPS GET requests from server to server.
The big advantage of Cushy is that it is not a big "off the shelf" black box that you depend on without understanding it. Cushy is a moderately large Java source file, but there are bigger classes in CAS already. You can read it, understand it, and make any specific decisions and adjustments to meet any specific network, cluster, availability, or other requirements. If you need to change CAS in special ways, you have complete control and can adjust the replication to match those changes.
However, to do this Cushy does not even attempt to replicate Service Tickets to all the other nodes before the validate request is issued. Today the front end that routes requests among the nodes of the cluster can be programmed to route the validate request to the node that generated the ticket based on the node ID field at the end of the ticket.
Executive Summary
This is a quick introduction for those in a hurry.
CAS is a Single SignOn solution. Internally, it creates a set of objects called Tickets. There is a ticket for every logged on user, and short term Service Tickets that exist while a user is being authenticated to an application. The Business Layer of CAS creates tickets by, for example, validating your userid and password in a back end system like Active Directory. The tickets are stored in a plug in component called a Ticket Registry.
For a single CAS server, the Ticket Registry is just a in memory table of tickets (a Java "Map" object) keyed by the ticket ID string. When more than one CAS server is combined to form a cluster, then an administrator chooses one of several optional Ticket Registry solutions that allow the CAS servers to share the tickets.
One clustering option is to use JPA, the standard Java service to map objects to tables in a relational database. All the CAS servers share a database, which means that any CAS node can fail but the database has to stay up all the time or CAS stops working. Other solutions use generic object "caching" solutions (Ehcache, JBoss Cache, Memcached) where CAS puts the tickets into what appears to be a common container of Java objects and, under the covers, the cache technology ensures that the tickets are copied to all the other nodes.
JPA makes CAS dependent on a database. It doesn't really use the database for any real SQL stuff, so you could you almost any database system. However, the database is a single point of failure, so you need it to be reliable. If you already have a 24x7x365 database managed by professionals who can guarantee availability, this is a good solution. If not, then this is an insurmountable prerequisite for bringing up an application like CAS that doesn't really need database.
The various cache solutions should solve the problem. Unfortunately, they too have massively complex configuration parameters with multicast network addresses and timeouts, and while they are designed to work across complete node failure, experience suggests that they are not designed to work when a CAS machine is "sick". That is, if the machine is down and does not respond to any network requests the technology recovers, but if the node is up and receives messages but just doesn't process them correctly then queues start to clog up, they back up into CAS itself and then CAS stops working simultaneously on all nodes. There is also a problem with the "one big bag of objects" model if a router fails that connects two machine rooms, two CAS nodes are separated, and now there are separate versions of what the system is designed to believe is a single cohesive collection.
If you understand the problem CAS is solving and the way the tickets fit together, then each type of failure presents specific problems. Cushy is designed to avoid the big problems and provide transparent service to 99.9% of the CAS users. If one or two people experience an error message due to a CAS crash, and CAS crashes only once a year, then that is good enough especially when the alternative technologies can cause the entire system to stop working for everyone.
Cushy is a cute word that roughly stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale".
The name explains what it does.Java has a built in operation called writeObject that writes a binary version of Java objects to disk. If you use it on a complex object, like a list of all the tickets in the Registry, then it creates a disk file with all the tickets in the list. Later on you use readObject to turn the disk file back into a copy of the original list. Java calls this mechanism "Serialization". Using just one statement and letting Java do all the work and handle all the complexity makes this easy.
The other mechanisms (JPA or the cache technologies) operate on single tickets. They write individual tickets to the database or replicate them across the network. Obviously this is vastly more efficient than periodically copying all the tickets to disk. Except that at Yale, the entire Registry of tickets can be written to a disk file in 1 second and it produces a file about 3 megabytes in size. Those numbers are so trivial that writing a copy of the entire Registry to disk once every 5 minutes, or even once a minute, is trivial on a modern server. Given the price of hardware, being more efficient than that is unnecessary.
Once you have a file on disk it should not take very long to figure out how to get a copy of that file from one Web Server to another. An HTTP GET is the obvious solution, though if you had shared disk there are other solutions.
Going to an intermediate disk file was not the solution that first comes to mind. If the tickets are in memory on one machine and they have to be copied to memory on another machine, some sort of direct network transfer is going to be the first thing you think about. However, the intermediate disk file is useful to restore tickets to memory if you have to restart your CAS server for some reason. Mostly, it means that the network transmission is COMPLETELY separate from the process of creating, validating, and deleting tickets. If the network breaks down you cannot transfer the files, but CAS continues to operate normally and it can even generate new files with newer copies of all the tickets. When the network comes back the file transfer resumes independent of the main CAS services. So replication problems can never interfere with CAS operation.
Cushy is based on four basic design principles:
- CAS is very important, but it is small and cheap to run.
- Emphasize simplicity over efficiency as long as the cost to run remains trivial.
- Assume the network front end is programmable.
- Trying for perfection is the source of most total system failures. Allow one or two users to get a temporary error message when a CAS server fails.
How it works
Cushy is simple enough it can be explained to anyone, but if you are in a rush you can stop here.
Back in the 1960's a "checkpoint" was a copy of the important information from a program written on disk so if the computer crashed the program could start back at almost the point it left off. If a CAS server saves its tickets to a disk file, reboots, and then reads the tickets from the file back into memory it is back to the same state it had before rebooting. If you transfer the file to another computer and bring CAS up on that machine, it have moved the CAS server from one machine to another. Java writeObject and readObject guarantee the state and data are completely saved and restored.
JPA and the cache technologies try to maintain the image of a single big common bucket of shared tickets. This is a very simple view, but it is very hard to maintain and rather fragile. Cushy maintains a separate TicketRegistry for each CAS server, but replicates a copy of each TicketRegistry to all the other servers in the cluster.
Given the small cost of making a complete checkpoint, you could configure Cushy to generate one every 10 seconds and run the cluster on full checkpoints. It is probably inefficient, but using 1 second of one core and transmitting 3 megabytes of data to each node every 10 seconds is not a big deal on modern equipment. This was the first Cushy code milestone and it lasted for about a day before it was extended with a little extra code.
The next milestone (a day later) was to add an "incremental" file that contains all the tickets added or ticket ids of tickets deleted since the last full checkpoint. Creating multiple increments and transmitting only the changes the other node has not yet seen was considered, but it would require more code and complexity. If you generate checkpoints every few minutes, then the incremental file grows as more changes are made but it never gets really large. It is well know that the overhead of creating and opening a file or establishing a network connection is so great that the difference between reading or writing 5K or 100K is trivial.
In Cushy you configure a timer in XML. If you set the timer to 10 seconds, then Cushy writes a new incremental file every 10 seconds. Separately you configure the time between full checkpoints. When the timer goes off, if enough time has passed since the last checkpoint then instead of writing an incremental file, this time it writes a new full Checkpoint.
Only a small number of tickets are added, but lots of Service Tickets have been created and deleted and there is no good way to keep the list of expired Service Tickets from making the incremental file larger. So if you tried to separate full checkpoints by an unreasonable amount of time you would find the incremental file had grown to be larger than the checkpoint file and you have made things worse rather than better. So the expectation is you do a full checkpoint somewhere between every 1-10 minutes and you do an incremental somewhere between every 5 -15 seconds, but test it and make your own decisions.
A Service Ticket is created and then is immediately validated and deleted. Trying to replicate Service Tickets to the other nodes before the validation request comes in is an enormous problem that screws up the configuration and timing parameters for all the other Ticket Registry solutions. Cushy doesn't try to do replication at this speed. Instead, it has CAS configuration elements that ensure that each Ticket ID contains an identifier of the node that created it, and it depends on a front end smart enough to route any of the ticket validation requests to the node that created the ticket and already has it in memory. Then replication only is needed for crash recover.
Note: If the front end is not fully programmable it is a small programming exercise to be considered in Cushy 2.0 to forward the validation request from any CAS node to the node that owns the ticket and then pass back the results of the validation to the app.
Ticket Names
As with everything else, CAS has a Spring bean configuration file (uniqueIdGenerators.xml) to configure how ticket ids are generated. If you accept the defaults, then tickets have the following format:
type - num - random - nodename
where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end of the ticket is identified as a nodename.
In vanilla CAS the nodename typically comes from the cas.properties file, but Cushy requires every node in the cluster to have a unique name and even when you are using real clustering many CAS locations leave the "nodename" suffix on the ticket id to its default value of "-CAS". Cushy adds a smarter configuration bean described below and enforces the rule that the end of the ticket really identifies the node that created it and therefore owns it.
How it Fails (Nicely)
The Primary + Warm Spare Cluster
One common cluster model is to have a single master CAS server that normally handles all the requests, and a normally idle backup server (a "warm spare") that does nothing until the master goes down. Then the backup server handles requests while the master is down.
During normal processing the master server is generating tickets, creating checkpoints and increments, and sends them to the backup server. The backup server is generating empty checkpoints with no tickets because it has not yet received a request.
Then the master is shut down or crashes. The backup server has a copy in memory of all the tickets generated by the master, except for the last few seconds before the crash. It can handle new logins and it can issue Service Tickets against logins previously processed by the master, using its copy of the master's registry.
Now the master comes back up and, for this example, let us assume that it resumes its role as master (there are configurations where the backup becomes the new master and so when the old master comes back it becomes the new backup. This is actually easier for Cushy).
The master restores from disk a copy of its old registry and over the network it fetches a copy of the registry from the backup. It now has access to all the login or proxy tickets created by the backup while it was down, and it can issue Service Tickets based on those logins.
However, the failure has left some minor issues that are not important enough to be problems. Because each server is the owner of its own tickets and registry, each has Read-Only access to the tickets of the other server. (Strictly speaking that is not true. You can temporarily change tickets in your copy of the other node's registry, but when the other node comes back up and generates its first checkpoint, whatever changes you made will be replaced by a copy of the old unmodified ticket). So the master is unaware of CAS logouts that occurred while it was down and although it can process a logout for a user that logged into the backup while it was down, it really has no way to actually delete the login ticket. Since no browser has the TGT ID in a cookie any more, nobody will actually be able to use the zombie TGT, but the ticket is going to sit around in memory until it times out.
There are a few more consequences to Single SignOut that will be explained in the next section.
A Smart Front End
A programmable front end is configured to send Validate requests to the CAS server that generated the Service Ticket, /proxy requests to the CAS server that generated the PGT, other requests of logged on users to the CAS server they logged into, and login requests based on standard load balancing or similar configurations. Each ticket has a suffix that indicates which CAS server node generated it.
- If the URL "path" is a validate request (/cas/validate, /cas/serviceValidate, etc.) then route to the node indicated by the suffix on the value of the ticket= parameter.
- If the URL is a /proxy request, route to the node indicated by the suffix of the pgt= parameter.
- If the request has a CASTGC cookie, then route to the node indicated by the suffix of the TGT that is the cookie's value.
- Otherwise, or if the node selected by 1-3 is down, choose a CAS node using whatever round robin or master-backup algorithm previously configured.
So normally all requests go to the machine that created and therefore owns the ticket, no matter what type of ticket it is. When a CAS server fails, requests for its tickets are assigned to one of the other servers. Most of the time the CAS server recognizes this as a ticket from another node and looks in the current shadow copy of that node's ticket registry.
As in the previous example, a node may not have a copy of tickets issued in the last few seconds, so one or two users may see an error.
If someone logged into the failed node needs a Service Ticket, the request is routed to any backup node which creates a Service Ticket (in its own Ticket Registry with its own node suffix which it will own) chained to the copy of the original Login Ticket in the appropriate shadow Ticket Registry. When that ticket is validated, the front end routes the request based on the suffix to this node which returns the Netid from the Login Ticket in the shadow registry.
Again, the rule that each node owns its own registry and all the tickets it created and the other nodes can't successfully change those tickets has certain consequences.
- If you use Single SignOff, then the Login Ticket maintains a table of Services to which you have logged in so that when you logout or when your Login Ticket times out in the middle of the night then each Service gets a call from CAS on a published URL with the Service Ticket ID you used to login so the application can log you off if it has not already done so. In failover mode a backup server can issue Service Tickets for a failed nodes TGT, but it cannot successfully update the Service table in the TGT, because when the failed node comes back up it will restore the old Service table along with the old TGT.
- If the user logs out and the Services are notified by the backup CAS server, and then the node that owned the TGT is restored along with the now undead copy of the obsolete TGT, then in the middle of the night that restored TGT will timeout and the Services will all be notified of the logoff a second time. It seems unlikely that anyone would ever write a service logout so badly that a second logoff would be a problem. Mostly it will be ignored.
You have probably guessed by now that Yale does not use Single SignOut, and if we ever enabled it we would only indicate that it is supported on a "best effort" basis.
CAS Cluster
In this document a CAS "cluster" is just a bunch of CAS server instances that are configured to know about each other. The term "cluster" does not imply that the Web servers are clustered in the sense that they share Session information. Nor does it depend on any other type of communication between machines. In fact, a CAS cluster could be created from a CAS running under Tomcat and one running under JBoss.
To the outside world, the cluster typically shares a common virtual URL simulated by the Front End device. At Yale, CAS is "https://secure.its.yale.edu/cas" to all the users and applications. The "secure.its.yale.edu" DNS name is associated with an IP address managed by the BIG-IP F5 device. It terminates the SSL, then examines requests and based on programming called iRules it forwards requests to any of the configured CAS virtual machines.
Each virtual machine has a native DNS name and URL. It is these "native" URLs that define the cluster because each CAS VM has to use the native URL to talk to another CAS VM. At Yale those URLs follow a pattern of "https://vm-foodevapp-01.web.yale.internal:8080/cas".
Internally, Cushy configuration takes a list of URLs and generates a cluster definition with three pieces of data for each cluster member: a nodename like "vmfoodevapp01" (the first element of the DNS name with dashes removed), the URL, and the ticket suffix that identifies that node (at Yale the F5 likes the ticket suffix to be an MD5 hash of the DNS name).
Sticky Browser Sessions
An F5 can be configured to have "sticky" connections between a client and a server. The first time the browser connects to a service name it is assigned any available backend server. For the next few minutes, however, subsequently requests to the same service go back to whichever server the F5 assigned to handle the first request.
Intelligent routing is based on tickets that exist only after you have logged in. CAS was designed (for better or worse) to use Spring Webflow which keeps information in the Session object during the login process. For Webflow to work, one of two things must happen:
- The browser has to POST the Userid/Password form back to the CAS server that sent it the form (which means the front end has to use sticky sessions based on IP address or JSESSIONID value).
- You have to use real Web Server clustering so the Web Servers all exchange Session objects based on JSESSIONID.
Option 2 is a fairly complex process of container configuration, unless you have already solved this problem and routinely generate JBoss cluster VMs using some canned script. Sticky sessions in the front end are somewhat easier to configure and obviously they are less complicated than routing request by parsing the ticket ID string.
Yale made a minor change to the CAS Webflow to store extra data in hidden fields of the login form, and an additonal check so if the Form POSTs back to another server the other server can handle the rest of the login without requiring Session data.
What is a Ticket Registry
This is a rather detailed description of one CAS component, but it does not assume any prior knowledge.
CAS provides a Single SignOn function. It acts as a system component, but internally it is structured like most other Web applications. Internally it creates, validates, and deletes objects called Tickets. The Ticket Registry is the component that holds the tickets while CAS is running.
When the user logs in, CAS creates a ticket that the user can use to create other tickets (a Ticket Granting Ticket or TGT, although a more friendly name for it is the "Login Ticket"). Then when someone previously logged in uses CAS to authenticate to another Web application, CAS creates a Service Ticket (ST).
Web applications are traditionally defined in three layers. The User Interface generates the Web pages, displays data, and processes user input. The Business Logic validates requests, verifies inventory, approves the credit card, and so on. The backend "persistence" layer talks to a database. CAS doesn't sell anything, but it has roughly the same three layers.
The CAS User Interface uses Spring MVC and Spring Web Flow to log a user on and to process requests from other Web applications. The Business Logic validates the userid and password (typically against an Active Directory), and it creates and deletes the tickets. CAS tickets, however, typically remain in memory and do not need to be written to a database or disk file. Nevertheless, the Ticket Registry is positioned logically where the database interface would be in any other application program, and sometimes CAS actually uses a database.
CAS was written to use the Spring Java Framework to configure its options. CAS requires some object that implements the TicketRegistry function. JASIG CAS provides at least five alternative Ticket Registries. You pick one and then insert its name (and configure its parameters) using a documented Spring XML file which not surprisingly is named "ticketRegistry.xml". Given this modular plug-in design, Cushy is just one more option you can optionally configure with this file.
When you have a regular Web application that sells things, the objects in the application (products, inventory, orders) would be stored in a database and the most modern way to do this is with JPA. To support the JASIG JPA Ticket Registry, all the Java source for tickets and things that tickets contain or point to are annotated with references to database tables and the names and data types of the columns in the table that each data field maps to. If you don't use the JPA Ticket Registry these annotations are ignored. JPA uses the annotations to generate and then weave into these objects invisible support code to detect when something has changed and track connections from one object to the next.
The "cache" versions (Ehcache, JBoss Cache, Memcached) of JASIG TicketRegistry modules have no annotations and few expectaions. They use ordinary objects (sometimes call Plain Old Java Objects or POJOs). They require the objects to be serializable because, like Cushy, they use the Java writeObject statement to turn any object to a stream of bytes that can be held in memory, stored on disk, or sent over the network.
CAS tickets are all serializable, but they are not designed to be very nice about it. This is the "dirty secret" of CAS. It has always expected tickets to be serialized, but it breaks some of the rules and, as a result, can generate failures. They don't happen often, but CAS runs 24x7 and anything that can go wrong will go wrong. With one of the caching solutions, when it goes wrong it is deep inside a huge black box of "off the shelf" code that may or may not recover from the error.
The purpose of this section is to describe in more detail than you find in other CAS documentation just what is going on here, how Cushy avoids problems, and how Cushy would recover even if something went wrong.
In simple terms, the Login ticket (the TGT) "contains" your Netid (username, principal, whatever you call it). In more detail the TGT points to an Authentication object that points to a Principal object that contains the Netid. Currently when a user logs on the TGT, Netid, and any attributes are all determined once and that part of the TGT never changes. In the future, CAS may add higher levels of authentication (secondary "factors") and that might change the important part of the TGT, but that is not a problem now.
However, if you use Single SignOut then CAS also maintains a "services" table in the TGT associates old used ServiceTicket ID strings and a reference to a Service object that contains the URL that CAS should call to notify a service that a user previously authenticated by CAS has logged out. The services table changes through the day as users log in to applications.
CAS also generates Service Tickets. However, the ST is used and discarded in a few milliseconds during normal use, or if it is never claimed it times out after a default period of 10 seconds. When the ST is validated by the application, CAS returns the Netid, but CAS does not store the Netid in the ST. Instead, it points the ST to the TGT and the TGT "contains" the Netid. When the application validates the ST, CAS goes from the ST to the TGT, gets the Netid, deletes the ST, and returns the Netid to the application.
So the ST is around for such a short period of time that you would not think it has an important affect on the structure of the Ticket Registry. There are, however, two impacts:
- First, whenever you ask Java writeObject to serialize an object to bytes, Java not only turns that object into bytes but it also makes a copy of any other object it points to. Cushy, Ehcache, JBoss Cache, and Memcached all serialize objects, but only here will you find anyone explaining what that means. When you think you are serializing an ST what you are really getting is an ST, the TGT it points to, the Authentication and Principal objects the TGT points to, and then the Service objects for all the services that the TGT is remembering for Single SignOut. In reality, the only thing the ST needs is the Netid, but because CAS is designed with many layers of abstraction you get this entire mess whether you like it or not.
- If you do not assume that the Front End is smart enough to route validation requests to the right host, then there is a chase condition between the cache based ticket replication systems copying the ST to the other nodes and the possibility that the front end will route the ST validation request to one of those other nodes. The only way to make sure this will never happen is to configure the cache replication systems to copy the ST to all the other nodes before returning to the CAS Business Layer to confirm the ST is stored. However, if network I/O is synchronous, then if it fails then CAS stops running as a result.
A special kind of Service is allowed to "Proxy", to act on behalf of the user. Such a service gets its own Proxy Granting Ticket (PGT) which acts like a TGT in the sense that it generates Service Tickets and the ST points back to it. However, a PGT does not "contain" the Netid. Rather the PGT points to the TGT which does contain the Netid.
When Cushy does a full checkpoint of all the tickets, it doesn't matter how the tickets are chained together. Under the covers of the writeObject statement, Java does all the work of following the chains and understanding the structure, then it write out a blob of bytes that will recreate the exact same structure when you read it back in.
The caching solutions never serialize the entire Registry. They write single tickets one at a time, except that as we have seen, a single ST or PGT points to a TGT that points to a lot of junk and all that gets written out every time you think you are serializing a "single ticket".
When Cushy generates an incremental file between full checkpoints, then all the added Tickets in the incremental file are individually serialized, producing the same result as the caching solutions. With Cushy, however, every 5 minutes the full checkpoint comes along and cleans it all up.
The reason why CAS can tolerate this sloppy serialization is that it doesn't affect the Business Logic. Suppose a ST is serialized on one node and is sent to another node where it is validate. Validation follows the chain from the ST to the TGT and then gets the Netid (and maybe the attributes). The result is the same whether you obtain the Netid from the "real" TGT or a copy of the real TGT made a few seconds ago. Once the ST is validated it is deleted, and that also discards all the other objects chained off the ST by the caching mechanism. It it isn't validate, then the ST times out and is deleted anyway.
If you have a PGT that points to a TGT, and if the PGT is serialized and copied to another node, and if after it is copied the TGT is changed (which cannot happen today but might be something CAS does in a future release with multifactor support), then the copy of the PGT points to the old copy of the TGT with the old info while the original PGT points to the original TGT with the new data. This problem would have to be solved before you introduce any new CAS features that meaningfully change the TGT.
Cushy solves this currently non-existent problem every time it does a full checkpoint. Between checkpoints, only for the tickets added since the last checkpoint, Cushy creates copies of TGTs from the individually serialized STs and PGTs just like the caching systems. It creates a lot fewer of them and they last only a few minutes.
Now for the real problem that CAS has not solved.
When you serialize a collection, Java must internally obtain an "iterator" and step one by one through the objects in the collection. An iterator knows how to find the next or previous object in the collection. However, the iterator can break if while it is dealing with one element in the collection another thread is adding a new element to the collection "between" the object that serialization is currently processing and the object that the iterator expects to be next. When this happens, serialization stops and throws an error exception.
So if you are going to use a serialization based replication mechanism (like Ehcache, JBoss Cache, or Memcached) then it is a really, really bad idea to have a non-threadsafe collection in your tickets, such as the services table in the TGT used for Single SignOut. Collisions don't happen all that often, but as it turns out a very common user behavior can make them much more likely.
Someone presses the "Open All In Tabs" button of the browser to create several tabs simultaneously. Two tabs reference CAS aware applications that redirect the browser to CAS. The user is already logged on, so each tab only needs a Service Ticket. The problem is that both Service Tickets point to the same TGT, and both go into the services table for Single SignOut, and the first one to get generated can start to be serialized while the second one is about to add its new entry in the services table.
Yale does not use Single SignOut, so we simply disabled the services table. If you want to solve this problem then at least Cushy gives you access to all the code, so you can come up with a solution if you understand Java threading.
Usage Pattern
Users start logging into CAS at the start of the business day. The number of TGTs begins to grow.
Users seldom log out of CAS, so TGTs typically time out instead of being explicitly deleted.
Users abandon a TGT when they close the browser. They then get a new TGT and cookie when they open a new browser window.
Therefore, the number of TGTs can be much larger than the number of real CAS users. It is a count of browser windows and not of people or machines.
At Yale around 3 PM a typical set of statistics is:
Unexpired-TGTs: 13821
Unexpired-STs: 12
Expired TGTs: 30
Expired STs: 11
So you see that a Ticket Registry is overwhelmingly a place to keep TGTs (in this statistic TGTs and PGTs are combined).
Over night the TGTs from earlier in the day time out and the Registry Cleaner deletes them.
So generally the pattern is a slow growth of TGTs while people are using the network application, followed by a slow reduction of tickets while they are asleep, with a minimum probably reached each morning before 8 AM.
If you display CAS statistics periodically during the day you will see a regular pattern and a typical maximum number of tickets in use "late in the day".
Translated to Cushy, the cost of the full checkpoint and the size of the checkpoint file grow over time along with the number of active tickets, and then the file shrinks over night. During any period of intense login activity the incremental file may be unusually large. If you had a long time between checkpoints, then around the daily minimum (8 AM) you could get an incremental file bigger than the checkpoint.
Some Metrics
At Yale there are typically more than 10,000 and fewer than 20,000 Login tickets. Because Service Tickets expire when validated and after a short timeout, there are only several dozen unexpired Service Tickets at any given time.
Java can serialize a collection of 20,000 Login tickets to disk in less than a second (one core of a Sandy Bridge processor).Cushy has to block normal CAS processing just long enough to get a list of references to all the tickets, and the all the rest of the work occurs under a separate thread unrelated to any CAS operation that does not interfere with CAS processing.
Of course, Cushy also has to deserialize tickets from the other nodes. However, remember that if you are currently using any other Ticket Registry the number of tickets reported out in the statistics page is the total number combined across all nodes, while Cushy serializes only the tickets that the current node owns and it deserializes the tickets for the other nodes. So generally you can apply the 20K tickets = 1 second rule of thumb. Serializing 200,000 tickets has been measured to take 9 seconds (so it scales as expected) and if you convert the 20K common pool of tickets to Cushy, then each node will serialize 10K of tickets it owns and deserialize 10K of tickets from the other node (load balanced) or else in a master-backup configuration the master will serialize 20K tickets and deserialize 0, while the backup will serialize 0 and deserialize 20K. You come to the same number no matter how you slice it.
Incrementals are trivial (.1 to .2 seconds).
CushyTicketRegistry (the code)
CushyTicketRegistry is a medium sized Java class that does all the work. It began with the standard JASIG DefaultTicketRegistry code that stores the tickets in memory (in a ConcurrentHashMap). Then on top of that base, it adds code to serialize tickets to disk and to transfer the disk files between nodes using HTTP.
Unlike the JASIG TicketRegistry implementations, CushyTicketRegistry does not create a single big cache of tickets lumped together from all the nodes. Each node is responsible for the tickets it creates. The TicketRegistry on each node is transferred over the network to the other nodes. Therefore, on each node there is an instance of CushyTicketRegistry for the locally created tickets and other instances of the class for tickets owned by the other nodes.
This is a custom solution designed for the specific CAS requirements. It is not a general object caching mechanism. It is really a strategy for the use of standard Java collections, serialization, and network I/O in a relatively small amount of code. Because the code is so small, it was convenient to put everything in a single class source file.
Configuration
In JASIG CAS, the administrator selects one of the several TicketRegistry optional implementations and configures it using a Spring Bean XML file located in WEB-INF/spring-configuration/ticketRegistry.xml. With CushyTicketRegistry this file creates the first "Primary" object instance that manages the Tickets created and owned by the local nodes. That object examines the configuration and creates additional "Secondary" object instances for every other node configured in the cluster.
The Cluster
Cluster configuration requirements became complex enough that they were moved into their own CushyClusterConfiguration class. This Bean is defined in front of the CushyTicketRegistry in the Spring ticketRegistry.xml file.
Why is this complicated? We prefer a single "cas.war" artifact that works everywhere. It has to work on standalone or clustered environments, in a desktop sandbox with or without virtual machines, but also in official DEV (development), TEST, and PROD (production) servers. Changing the WAR file for each environment is undesirable because we do not want to change the artifact between Test and Production. The original idea was to configure things at the container level (JBoss), but Yale Production Services did not want to be responsible for managing all that configuration stuff.
So CushyClusterConfiguration adds Java logic instead of just a static cluster configuration file. During initialization on the target machine it can determine all the IP addresses assigned to the machine and the machine's primary HOSTNAME. This now allows two strategies.
First, you can configure all your clusters (sandbox, dev, test, prod, ...). Then at runtime CushyClusterConfiguration determines the IP addresses of the current machine and scans each cluster definition provided. It cannot use a cluster that does not contain the current machine, so it stops and uses the first cluster than contains a URL that references an IP address on the current server.
If none of the configured clusters contains the current machine, or if no configuration is provided, then Cushy uses the HOSTNAME and some Java code. The code was written for the Yale environment and can describe other environments, but if you already have a cluster with other machine naming conventions then you may want to modify or replace the Java at the end of this bean.
At Yale, the DEV, TEST, and PROD machines are all part of a two machine cluster where the HOSTNAME contains a "-01" or "-02" suffix. So by finding the current HOSTNAME it can say that if this machine has "-01" in its name, the other machine in the cluster is "-02" and the reverse.
Sounds easy, but as always the actual code implies some rules you need to know.
First, you can define the CushyClusterConfiguration bean with or without a "clusterDefinition" property. If you provide the property, it is a List of Lists of Strings:
<bean id="clusterConfiguration" class="edu.yale.its.tp.cas.util.CushyClusterConfiguration"
p:md5Suffix="yes" >
<property name="clusterDefinition">
<list>
<!-- Desktop Sandbox cluster -->
<list>
<value>http://foo.yu.yale.edu:8080/cas/</value>
<value>http://bar.yu.yale.edu:8080/cas/</value>
</list>
<!-- Development cluster -->
<list>
<value>https://casdev1.yale.edu:8443/cas/</value>
<value>https://casdev2.yale.edu:8443/cas/</value>
</list>
</list>
</property>
</bean>
In spring, the <value> tag generates a String, so this is what Java calls a List<List<String>> (List of Lists of Strings). As noted, the top List has two elements. The first element is a List with two Strings for the machines foo and bar. The second element is another List with two strings for casdev1 and casdev2.
Only one of these cluster definitions should apply. At run time CushyClusterConfiguration selects the first usable cluster configuration, where a configuration is not usable if the current machine is not in the cluster.
There is no good way to determine all the DNS names that may resolve to an address on this server. However, it is relatively easy in Java to find all the IP addresses of all the LAN interfaces on the current machine. This list may be longer than you think. Each LAN adapter can have IPv4 and IPv6 addresses, and then there can be multiple real LANs and a bunch of virtual LAN adapters for VMWare or Virtualbox VMs you host or tunnels to VPN connections. Of course, there is always the loopback address.
So CushyClusterConfiguration goes to the first cluster (foo and bar). It does a name lookup (in DNS and in the local etc/hosts file) for each server name (foo.yu.yale.edu and bar.yu.yale.edu). Each lookup returns a list of IP addresses associated with that name.
CushyClusterConfiguration selects the first cluster and first host computer whose name resolves to an IP address that is also an address on one of the interfaces of the current computer. The DNS lookup of foo.yu.yale.edu returns a bunch of IP addresses. If any of those addresses is also an address assigned to any real or virtual LAN on the current machine, then that is the cluster host name and that is the cluster to use. If not, then try again in the next cluster.
CushyClusterConfiguration can determine if it is running in the sandbox on the desktop, or if it is running the development, test, production, disaster recovery, or any other cluster definition. The only requirement is that IP addresses be distinct across servers and cluster.
Restrictions (if you use a single WAR file with a single global configuration):
It is not generally possible to determine the port numbers that a J2EE Web Server is using. So it is not possible to make distinctions based only on port number. CushyClusterConfiguration requires a difference in IP addresses. So if you want to emulate a cluster on a single machine, use VirtualBox to create VMs.
(This does not apply to Unit Testing, because Unit Testing does not use a regular WAR and is not constrained to a single configuration file. If you look at the unit tests you can see examples where there are two instances of CushyTicketRegistry configured with two instances of CushyClusterConfiguration with two cluster configuration file where both names map to the local machine and are configured in a different order. For example, if your etc/hosts file is configured so that both foo and bar are mapped to the loopback address 127.0.0.1, then one configuration can list (foo,bar) and in that case Cushy will decide it is foo and the other node is bar, while the other configuration can list (bar,foo) and that instance of the Cushy classes will decide they are bar and the other node is foo.)
However, you have to be careful and in production enforce rules that prevent the algorithm from screwing up. You may have noticed the potential problem if you run VirtualBox or VMWare virtual machines on a Windows desktop computer at work. Windows has Dynamic DNS support and it is enabled by default. After a virtual LAN adapter has been configured you can go to its adapter configuration, select IPv4, click Advanced, select the DNS tab, and turn off the checkbox labelled "Register this connection's addresses in DNS". If you don't do this, then the private IP address (the 192.168.1.1 style address created to be private to this virtual LAN inside your computer) gets accidentally registered in the DNS server of the Active Directory along with the real IP address of the real LAN adapter of your machine. So throughout an organization there can be dozens or hundreds of computer that all have DNS names that resolve to the same 192.168.1.* address. Generally it doesn't matter and all the other code ignores it.
However, CushyClusterConfiguration is going to notice all the addresses on the machine and all the addresses registered to DNS, and it may misidentify the cluster if these spurious internal private addresses are being used on more than one sandbox or machine room CAS computer. It is unlikely that production or professionally managed machines will have this error, but you should be warned.
On the other hand, you can create this situation intentionally for test purposes by adding names with the loopback or private addresses to the etc/hosts table on your desktop sandbox computer. Just remember the algorithm and you can figure out the testing tricks yourself.
Autoconfigure
At Yale the names of DEV, TEST, and PROD machines follow a predictable pattern, and CAS clusters have only two machines. So production services asked that CAS automatically configure itself based on those conventions. If you have similar conventions and any Java coding expertise you can modify the autoconfiguration logic at the end of CushyClusterConfiguration Java source.
CAS is a relatively simple program with low resource utilization that can run on very large servers. There is no need to spread the load across multiple servers, so the only reason for clustering is error recovery. At Yale a single additional machine is regarded as providing enough recovery.
At Yale, the two servers in any cluster have a DNS name that ends in names that ends in "-01" or "-02". Therefore, Cushy autoconfigure gets the HOSTNAME of the current machine, looks for a "-01" or "-02" in the name, and when it matches creates a cluster with the current machine and one additional machine with the same name but substituting "-01" or for "-02" . Therefore, Cushy autoconfigure gets the HOSTNAME of the current machine, looks for a or the reverse.
If no configured cluster matches the current machine IP addresses and the machine does not autoconfigure (because the HOSTNAME does not have "-01" or "-02" in the name, and when it matches creates a cluster with the current machine and one additional machine with the same name but substituting "-01" for "-02" or the reverse.
If no configured cluster matches the current machine IP addresses and the machine does not autoconfigure (because at Yale the HOSTNAME does not have "-01" or "-02"), then Cushy configures a single standalone server with no cluster. At this point Cushy is just a very large and complicated version of the original DefaultTicketRegistry code.
...
), then Cushy configures a single standalone server with no cluster.
Even without a cluster, Cushy still checkpoints the ticket cache to disk and restores the tickets across a reboot. So it provides a useful function in a single machine configuration that is otherwise only available with JPA and a database.
Other Parameters
Typically in the ticketRegistry.xml Spring configuration file you configure CushyClusterConfiguration as a bean with id="clusterConfiguration" first, and then configure the usual id="ticketRegistry" using CusyTicketRegistry. The clusterConfiguration bean exports some properties that are used (through Spring EL) to configure the Registry bean.
<bean id="ticketRegistry" class="edu.yale.cas.ticket.registry.CushyTicketRegistry"
p:serviceTicketIdGenerator-ref="serviceTicketUniqueIdGenerator"
p:checkpointInterval="300"
p:cacheDirectory= "#{systemProperties['jboss.server.data.dir']}/cas"
p:nodeName= "#{clusterConfiguration.getNodeName()}"
p:nodeNameToUrl= "#{clusterConfiguration.getNodeNameToUrl()}"
p:suffixToNodeName="#{clusterConfiguration.getSuffixToNodeName()}" />
The nodeName, nodeNameToUrl, and suffixToNodeName parameters link back to properties generated as a result of the logic in the YaleClusterConfiguration bean (id="clusterConfiguration")CushyClusterConfiguration bean.
The cacheDirectory is a work directory on disk to which it has read/write privileges. The default is "/var/cache/cas" which is Unix syntax but can be created as a directory structure on Windows. In this example we use the Java system property for the JBoss /data subdirectory when running CAS on JBoss.
The checkpointInterval is the time in seconds between successive full checkpoints. Between checkpoints, incremental files will be generated.
YaleClusterConfiguration CushyClusterConfiguration exposes a md5Suffix="yes" parameter which causes it to generate a ticketSuffix that is the MD5 hash of the computer host instead of using the nodename as a suffix. The F5 likes to refer to computers by their MD5 hash and using that as the ticket suffix simplifies the F5 configuration even though it makes the ticket longer.
...