Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

In 2010 Yale upgraded to CAS 3.4.2 and implemented "High Availability" CAS clustering using the JBoss Cache option. Unfortunately, the mechanism designed to improve CAS relilability ended up as the cause of most CAS failures. If JBoss Cache (or any other clustering option) fails due to some unspecified network layer problem, requests back up in memory and eventually CAS stops running on all members of the cluster. None of the other available CAS clustering options have been reported to work flawlessly.

...

So CushyTicketRegistry was written to hold CAS tickets in memory and share to replicate them with to other CAS servers so that CAS users can continue to access resources they can take over if one server goes down. It uses only simple Java services with no new external dependencies. It does the entire job in simple readable Java source that can be adapted to handle future requirements.Two things make this possible. First, modern network front end devices are smart enough to make the CAS clustering problem fairly simple. Second, there are really not a lot of CAS tickets and so a very simple but not optimally efficient solution is entirely reasonable on modern serversfails. It turns out that it is trivial (both in code and overhead) to snapshot the entire collection of tickets to a disk file using the Java writeObject operation. The resulting fairly small file can then be transferred between CAS servers using an HTTPS GET, because CAS runs on a Web Server so you might just as well use it. This approach may not be as efficient as the more sophisticated technology, but it is so dead flat simple that you can understand it, customize it, and arrange that it can never cause problems. More importantly, if it uses less than 5% of one core on a modern multicore commodity server, do you really need to be more efficient?

Every cluster of any type requires a network front end to route requests, detect failure, and maybe load balance. Cushy assumes that this front end is programmable, as most modern front ends are, and depends on routing rules that are entirely reasonable with today's devices.

Executive Summary

This is a quick introduction for those in a hurry.

...

For a single CAS server, the Ticket Registry is just a in memory table of tickets (a Java "Map" object) keyed by the ticket ID string. When more than one CAS server is combined to form a cluster, then an administrator chooses one of several optional Ticket Registry solutions that allow the CAS servers to share the tickets.

To support clusters with more than one CAS Server, one One clustering option is the JPA Ticket Registry where tickets are written to use JPA, the standard Java service to map objects to tables in a relational database using the standard Java object to RDBMS mapping framework. All the CAS servers share a database, which means that any CAS node can fail but the database has to stay up all the time or CAS stops working. Other solutions use generic off the shelf shared object cache solutions (ehcache, JBoss cache, memcached)object "caching" solutions (Ehcache, JBoss Cache, Memcached) where CAS puts the tickets into what appears to be a common container of Java objects and, under the covers, the cache technology ensures that the tickets are copied to all the other nodes.

JPA makes CAS dependent on a database, and if the database fails then all the CAS nodes fail at the same time. You can try to introduce a High Availability database, but this takes an already unnecessarily complicated solution and makes it more complicated. 

The various cache solutions tend to work well when the network is reliable but one of the CAS nodes fail. They do not work well when the CAS nodes are all fine, but the network has become disconnected or unreliable.

Both JPA and the caching solutions are designed to handle much bigger and more complex problems, so they have a lot of configuration options, both directly and indirectly (to use JPA you have to, for example, configure a database in the first place).

CAS is a fairly simple system that needs to solve a small number of problems when things go wrong. You can enumerate all the specific points in the CAS logic where node failure or network failure are going to raise issues. Solving these specific problems with specific code turned out to be a reasonable effort.

Cushy is a cute word that roughly stands for "Clustering Using Serialization to disk and Https to transmit the files between servers, written by Yale".

The name explains what it does. Java has a built in mechanism called "Serialization" that refers to a useful feature in the I/O library called "writeObject". This statement writes a java object to disk. It can write simple objects like strings, or it can write big objects like a List of all the tickets in the Registry. Java takes care of all the complexity. It automatically turns the objects into bytes on disk, and later on a "readObject" call restores the original object to memory on the same server or on another server elsewhere in the network. So Cushy starts with a single Java statement that does most of the work.

The various cache systems use multicast network traffic with timeouts and retries and lots of parameters. CAS is already running on a Web server that knows how to transmit a file from one computer to another. Yes, if you needed to cluster several dozen machines to each other you would want something exotic, but CAS typically runs on two machines and you might add a few extra, but nothing that a Web server cannot handle.

Now for the good part. Cushy has to make a point-in-time copy of the references (pointers) to all the tickets in the Registry, and while it is copying these pointers normal CAS processing is blocked for a few microseconds. Then it writes a copy of this List to disk in an operation that is completely separate from anything CAS is doing. Then once the file is on disk, copying the file from one CAS server to another is just HTTP stuff that has nothing to do with CAS or tickets or registries. So this process cannot ever crash CAS because the parts that could fail (nodes and networks) are completely separated from the mainline CAS processing.

Cushy is based on four basic design principles:

  1. CAS is very important, but it is small and cheap to run.
  2. Emphasize simplicity over efficiency as long as the cost to run remains trivial.
  3. Assume the network front end is programmable.
  4. Trying for perfection is the source of most total system failures. Allow one or two users to get a temporary error message when a CAS server fails.

How it works

Cushy is simple enough it can be explained to anyone, but if you are in a rush you can stop here.

Back in the 1960's a "checkpoint" was a copy of the important information from a program written on disk so if the computer crashed the program could start back at almost the point it left off. If a CAS server saves its tickets to a disk file, reboots, and then reads the tickets from the file back into memory it is pretty much back to the same state it had before rebooting. If you transfer the file to another computer and bring CAS up on that machine, it have pretty much moved the CAS server from one machine to another.

JPA and the cache technologies try to maintain the image of a single big common bucket of shared tickets. This is a very simple view, but it is very hard to maintain and rather fragile. Cushy maintains a separate TicketRegistry for each CAS server, but replicates a copy of each TicketRegistry to all the other servers in the cluster. This turns out to be good enough if you understand how CAS tickets are used and meet those requirements.

Given the small cost of making a complete checkpoint, you could configure Cushy to generate one every 10 seconds and run the cluster on full checkpoints. It is probably inefficient, but using 1 second of one core and transmitting 3 megabytes of data to each node every 10 seconds is not a big deal on modern equipment. This was the first Cushy code milestone and it lasted for about a day before it was extended with a little extra code.

Then Cushy added an "incremental" file that contains all the tickets added or ticket ids of tickets deleted since the last full checkpoint. Based on the "simplicity over efficiency" design, the incremental file contains all the changes since the last checkpoint, it grows between checkpoints, and every time you read the incremental file you get all the data that you previously got, plus new stuff on the end. Sure, we could figure out how to leave off the stuff you have already seen, but that would add more code than is necessary to save a few milliseconds of cheap CPU. So generate a new checkpoint every few minutes (1, 5, 10, 15) and between checkpoints you exchange much smaller incremental files every few seconds (5, 10, 15).

This is not fast enough to replicate Service Tickets between the time that they are created and the few milliseconds later when they are validated. Configuring conventional JPA and caching options so that an ST can be created on one CAS node and then validated on another node requires high speed synchronous cache replication that causes all the vulnerability. Thus the "Lazy" part of Lazy Ticket Replication is that we require that ticket validation requests be routed by the front end to the CAS node that issued the ticket, so we can replicate tickets every few seconds instead of synchronously in a matter of milliseconds.

Note: If the front end is not fully programmable it is a small programming exercise to be considered in Cushy 2.0 to forward the validation request from any CAS node to the node that owns the ticket and then forward on the results of the validation.

How it Fails (Nicely)

The Primary + Warm Spare Cluster

One perfectly simple solution is to configure a single CAS server to handle all the normal traffic, but have a second CAS server as a "warm spare" backup if the first server goes down. This works well because one CAS server today can handle much more traffic than anyone experiences in real life.

During normal processing, all the requests go to the primary server, which handles them and then based on the timer configuration periodically generates checkpoint and incremental updates that are loaded on the warm spare server. The warm spare generates no tickets of its own and receives no requests.

Then the primary server fails, and requests start to arrive at what was previously the spare machine. It has a copy of all the tickets on the primary, except for login and service tickets from the last few seconds. One or two people may have to login again because of the failure. During the failure the warm spare processes logins and issues service tickets on its own.

Then the primary server reboots. It loads its copy of its old ticket cache and resumes responsibility for all requests.

If the front end is not programmable, then the restoration of the primary server will be approximately as disruptive as as its failure. There will be a few people who logged in in the last few seconds or some service tickets issued by the spare server that the primary does not know about. Login tickets issued by the secondary will be available to the primary server, but they will not participate in Single SignOut. Sorry, that is just a restriction of Cushy 1.0.

A Smart Front End

A programmable front end is configured to send Validate requests to the CAS server that generated the Service Ticket, /proxy requests to the CAS server that generated the PGT, other requests of logged on users to the CAS server they logged into, and login requests based on standard load balancing or similar configurations. Each ticket has a suffix that indicates which CAS server node generated it.

  1. If the URL "path" is a validate request (/validate, /serviceValidate, ...) then route to the node indicated by the suffix on the value of the ticket= parameter.
  2. If the URL is a /proxy request, route to the node indicated by the suffix of the pgt= parameter.
  3. If the request has a CASTGC cookie, then route to the node indicated by the suffix of that cookie's value.
  4. Otherwise, or if the node selected by 1-3 is down, choose a CAS node using whatever round robin or priority algorithm previously configured.

So normally all requests go to the machine that created and therefore owns the ticket, no matter what type of ticket it is. When a CAS server fails, requests for its tickets are assigned to one of the other servers. Most of the time the CAS server recognizes this as a ticket from another node and looks in the current shadow copy of that node's ticket registry.

As in the previous example, a node may not have a copy of tickets issued in the last few seconds, so one or two users may see an error.

If someone logged into the failed node needs a Service Ticket, the request is routed to any backup node which then creates a Service Ticket (in its own Ticket Registry with its own node suffix which it will own) chained to the copy of the original Login Ticket in the appropriate shadow Ticket Registry. When that ticket is validated, the front end routes the request based on the suffix to this node which returns the Netid from the Login Ticket in the shadow registry.

The Cushy rule is that a node "owns" all the tickets it created and the other nodes have what is in the long term Read-Only access to tickets created by another node. This is forced by the behavior that when the other node comes back up it reloads its tickets from its last checkpoint and incremental files and picks up where it left off. It has no way to determine what the other nodes did while it was down, so generally that stuff has to be defined to be unimportant. This triggers several consequences:

  • When a node operating in backup mode issues a Service Ticket, the Single SignOff table that gets updated is the one in the local copy of the Login Ticket in the shadow registry. This is never transmitted back to the failed node and is replaced as soon as the failed node comes back up with a new copy of the Login Ticket that doesn't have this Single SignOff information. So Cushy regards Single SignOff as a "best effort" and not as a guarantee.
  • More interestingly, when a user logs off of CAS (which almost never happens in real life but is a possibility), then CAS does everything it does now except it does it to the shadown Login Ticket. Even if it deletes that ticket, when the real node comes back up it restores that Login Ticket, which doesn't matter because nobody has a Cookie pointing to it. The ticket sits in the owning Registry until it times out, and then the Single LogOut processing happens a second time, notifying all the applications that the user who already logged out has logged out again. It would be remarkably bad programming for any of those applications to care about this, but it is a "best effort" restriction.

 

CAS Cluster

In this document a CAS "cluster" is just a bunch of CAS server instances that are configured to know about each other. The term "cluster" does not imply that the Web servers are clustered and that they share Session information. Nor does it depend on any other type of communication between machines. In fact, a CAS cluster could be created from a CAS running under Tomcat and one running under JBoss.

To the outside world, the cluster typically shares a common virtual URL simulated by the Front End device. At Yale, CAS is "https://secure.its.yale.edu/cas" to all the users and applications. The "secure.its.yale.edu" DNS name is associated with an IP address managed by the BIG-IP F5 device. It terminates the SSL, then examines requests and based on programming called iRules it forwards requests to any of the configured CAS virtual machines.

Each virtual machine has a native DNS name and URL. It is these "native" URLs that define the cluster because each CAS VM has to use the native URL to talk to another CAS VM. At Yale those URLs follow a pattern of "https://vm-foodevapp-01.web.yale.internal:8080/cas". 

Internally, Cushy configuration takes a list of URLs and generates a cluster definition with three pieces of data for each cluster member: a nodename like "vmfoodevapp01" (the first element of the DNS name with dashes removed), the URL, and the ticket suffix that identifies that node (at Yale the F5 likes the ticket suffix to be an MD5 hash of the DNS name).

Ticket Names

CAS uses Spring XML configuration to decide things like how ticket ID strings are generated. Ticket IDs generated by CAS have the following format:

type - num - random - nodename

where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end of the ticket is identified as a nodename.

Sticky Browser Sessions

Sticky sessions mean that once a browser has been routed by the front end to one CAS node, then subsequent requests from the same browser are routed to the same CAS node.

CAS login initializes variables, validates that the service is properly registered, and considers a number of possible credentials (X.509 Certificate, integrated Windows authentication). If the password is required, it writes the login form. When the form it submitted, it validates the userid and password, generates the TGT, and writes the CAS cookie to the browser. The Spring Webflow component that CAS uses for login processing saves information in variables that are connected to the HTTPSession object. So during login it is important that the browser POST the userid and password back to the same node that handled the earlier part of login processing, or else you need a clustered application server that replicates HTTPSession data across nodes.

Everything is simpler during login and after if the F5 front end can be configured with "sticky sessions". There are three ways to route a browser to the correct node:

  1. During login to CAS you can use the JSESSIONID generated by all Java application servers. This is a widely used and understood F5 configuration, although it is commonly used for pretty much any application and has no specific CAS feature.
  2. After login the session goes away, and returning a browser to a CAS it previously logged into can be managed by IP address.
  3. However, the CAS specific logic that would be preferred is to look for the CASTGC Cookie in the HTTP headers. The Cookie will have a value of the TGT ID, and the end of that ID will be the node name the browser logged into:
    TGT-589847-EgeypFTOYICiSXmOUeY7YlCPUp6QPViGvesUpWWIvPqixrBmXE-CASVM1

    If the CASVM1 node is up, then we would prefer that this browser be routed to that node.

You can use real J2EE clustering and configure your favorite application server to replicate HttpSession objects (and their associated data) from one node to another. Then you don't need sticky sessions during login. If you don't cluster the containers, then CAS requires you to POST the userid and password from the login form to the same CAS server that wrote the form to the browser. This can be accomplished with JSESSIONID sticky sessions (or IP address sticky sessions) during login before the CASTGC cookie is generated. After login, the session goes away and then the CASTGC cookie is the best way to come back to the CAS that logged you in.

Using just JSESSIONID or IP address routing doesn't work in a "primary CAS with a warm spare" configuration. If works fine until there is a failure of the primary and then after a while the primary comes back up. At that point, new requests from users get routed to the primary CAS, but ticket validations from backend services are "stuck" by IP address to the backup server.

During normal processing any routing mechanism will work, and after a failure none of the routing mechanisms apply and the browser is sent to any CAS node that is still running. However, the Cookie routing mechanism is the easiest way to reroute a browser back to the node that logged in on once a CAS node failure has been corrected and the original node comes back up.

Service Ticket Validation Routing

Requests to validate a Service Ticket ID come from the application instead of the browser address. There are specific URL paths that indicate a validation request (/cas/validate, /cas/serviceValidate, /cas/proxyValidate etc.) and the URL ends with a "?" and the service= and ticket= parameters (in any order).

Thus if the F5 encounters a URL of the form:

https://secure.its.yale.edu/cas/serviceValidate?service=http...&ticket=ST-7173-UuvPzQ3ejnfbeKtLgLQQ-CASVM3

Then it needs to recognize that

  1. the path part of the URL references one of the ST Validation values (/serviceValidate) and
  2. the suffix of the ticket= parameter in the query string part of the URL is the node name CASVM3

Then the F5 should route this request to the node named in the ticket ID.

Proxy Routing

When a application using CAS Proxy support requests a Service Ticket, it calls "/cas/proxy" with a "pgt=" parameter. We need a front end rule that normally routes proxy requests to the node that created the Proxy Granting Ticket. This rule is essentially the same as the ST validation rule, except for the "/proxy" and the parameter name used to find the ticket suffix. It doesn't really use the database for any real SQL stuff, so you could you almost any database system. However, the database is a single point of failure, so you need it to be reliable. If you already have a 24x7x365 database managed by professionals who can guarantee availability, this is a good solution. If not, then this is an insurmountable prerequisite for bringing up an application like CAS that doesn't really need database.

The various cache solutions should solve the problem. Unfortunately, they too have massively complex configuration parameters with multicast network addresses and timeouts, and while they are designed to work across complete node failure, experience suggests that they are not designed to work when a CAS machine is "sick". That is, if the machine is down and does not respond to any network requests the technology recovers, but if the node is up and receives messages but just doesn't process them correctly then queues start to clog up, they back up into CAS itself and then CAS stops working simultaneously on all nodes. There is also a problem with the "one big bag of objects" model if a router fails that connects two machine rooms, two CAS nodes are separated, and now there are separate versions of what the system is designed to believe is a single cohesive collection.

If you understand the problem CAS is solving and the way the tickets fit together, then each type of failure presents specific problems. Cushy is designed to avoid the big problems and provide transparent service to 99.9% of the CAS users. If one or two people experience an error message due to a CAS crash, and CAS crashes only once a year, then that is good enough especially when the alternative technologies can cause the entire system to stop working for everyone.

Cushy is a cute word that roughly stands for "Clustering Using Serialization to disk and Https transmission of files between servers, written by Yale".

The name explains what it does.Java has a built in operation called writeObject that writes a binary version of Java objects to disk. If you use it on a complex object, like a list of all the tickets in the Registry, then it creates a disk file with all the tickets in the list. Later on you use readObject to turn the disk file back into a copy of the original list. Java calls this mechanism "Serialization". Using just one statement and letting Java do all the work and handle all the complexity makes this easy.

The other mechanisms (JPA or the cache technologies) operate on single tickets. They write individual tickets to the database or replicate them across the network. Obviously this is vastly more efficient than periodically copying all the tickets to disk. Except that at Yale, the entire Registry of tickets can be written to a disk file in 1 second and it produces a file about 3 megabytes in size. Those numbers are so trivial that writing a copy of the entire Registry to disk once every 5 minutes, or even once a minute, is trivial on a modern server. Given the price of hardware, being more efficient than that is unnecessary.

Once you have a file on disk it should not take very long to figure out how to get a copy of that file from one Web Server to another. An HTTP GET is the obvious solution, though if you had shared disk there are other solutions.

Going to an intermediate disk file was not the solution that first comes to mind. If the tickets are in memory on one machine and they have to be copied to memory on another machine, some sort of direct network transfer is going to be the first thing you think about. However, the intermediate disk file is useful to restore tickets to memory if you have to restart your CAS server for some reason. Mostly, it means that the network transmission is COMPLETELY separate from the process of creating, validating, and deleting tickets. If the network breaks down you cannot transfer the files, but CAS continues to operate normally and it can even generate new files with newer copies of all the tickets. When the network comes back the file transfer resumes independent of the main CAS services. So replication problems can never interfere with CAS operation.

Cushy is based on four basic design principles:

  1. CAS is very important, but it is small and cheap to run.
  2. Emphasize simplicity over efficiency as long as the cost to run remains trivial.
  3. Assume the network front end is programmable.
  4. Trying for perfection is the source of most total system failures. Allow one or two users to get a temporary error message when a CAS server fails.

How it works

Cushy is simple enough it can be explained to anyone, but if you are in a rush you can stop here.

Back in the 1960's a "checkpoint" was a copy of the important information from a program written on disk so if the computer crashed the program could start back at almost the point it left off. If a CAS server saves its tickets to a disk file, reboots, and then reads the tickets from the file back into memory it is back to the same state it had before rebooting. If you transfer the file to another computer and bring CAS up on that machine, it have moved the CAS server from one machine to another. Java writeObject and readObject guarantee the state and data are completely saved and restored.

JPA and the cache technologies try to maintain the image of a single big common bucket of shared tickets. This is a very simple view, but it is very hard to maintain and rather fragile. Cushy maintains a separate TicketRegistry for each CAS server, but replicates a copy of each TicketRegistry to all the other servers in the cluster.

Given the small cost of making a complete checkpoint, you could configure Cushy to generate one every 10 seconds and run the cluster on full checkpoints. It is probably inefficient, but using 1 second of one core and transmitting 3 megabytes of data to each node every 10 seconds is not a big deal on modern equipment. This was the first Cushy code milestone and it lasted for about a day before it was extended with a little extra code.

The next milestone (a day later) was to add an "incremental" file that contains all the tickets added or ticket ids of tickets deleted since the last full checkpoint. Creating multiple increments and transmitting only the changes the other node has not yet seen was considered, but it would require more code and complexity. If you generate checkpoints every few minutes, then the incremental file grows as more changes are made but it never gets really large. It is well know that the overhead of creating and opening a file or establishing a network connection is so great that the difference between reading or writing 5K or 100K is trivial.

In Cushy you configure a timer in XML. If you set the timer to 10 seconds, then Cushy writes a new incremental file every 10 seconds. Separately you configure the time between full checkpoints. When the timer goes off, if enough time has passed since the last checkpoint then instead of writing an incremental file, this time it writes a new full Checkpoint.

Only a small number of tickets are added, but lots of Service Tickets have been created and deleted and there is no good way to keep the list of expired Service Tickets from making the incremental file larger. So if you tried to separate full checkpoints by an unreasonable amount of time you would find the incremental file had grown to be larger than the checkpoint file and you have made things worse rather than better. So the expectation is you do a full checkpoint somewhere between every 1-10 minutes and you do an incremental somewhere between every 5 -15 seconds, but test it and make your own decisions.

A Service Ticket is created and then is immediately validated and deleted. Trying to replicate Service Tickets to the other nodes before the validation request comes in is an enormous problem that screws up the configuration and timing parameters for all the other Ticket Registry solutions. Cushy doesn't try to do replication at this speed. Instead, it has CAS configuration elements that ensure that each Ticket ID contains an identifier of the node that created it, and it depends on a front end smart enough to route any of the ticket validation requests to the node that created the ticket and already has it in memory. Then replication only is needed for crash recover.

Note: If the front end is not fully programmable it is a small programming exercise to be considered in Cushy 2.0 to forward the validation request from any CAS node to the node that owns the ticket and then pass back the results of the validation to the app.

Ticket Names

As with everything else, CAS has a Spring bean configuration file (uniqueIdGenerators.xml) to configure how ticket ids are generated. If you accept the defaults, then tickets have the following format:

type - num - random - nodename

where type is "TGT" or "ST", num is a ticket sequence number, random is a large random string like "dmKAsulC6kggRBLyKgVnLcGfyDhNc5DdGKT", and the suffix at the end of the ticket is identified as a nodename.

In vanilla CAS the nodename typically comes from the cas.properties file, but Cushy requires every node in the cluster to have a unique name and even when you are using real clustering many CAS locations leave the "nodename" suffix on the ticket id to its default value of "-CAS". Cushy adds a smarter configuration bean described below and enforces the rule that the end of the ticket really identifies the node that created it and therefore owns it.

How it Fails (Nicely)

The Primary + Warm Spare Cluster

One common cluster model is to have a single master CAS server that normally handles all the requests, and a normally idle backup server (a "warm spare") that does nothing until the master goes down. Then the backup server handles requests while the master is down.

During normal processing the master server is generating tickets, creating checkpoints and increments, and sends them to the backup server. The backup server is generating empty checkpoints with no tickets because it has not yet received a request.

Then the master is shut down or crashes. The backup server has a copy in memory of all the tickets generated by the master, except for the last few seconds before the crash. It can handle new logins and it can issue Service Tickets against logins previously processed by the master, using its copy of the master's registry.

Now the master comes back up and, for this example, let us assume that it resumes its role as master (there are configurations where the backup becomes the new master and so when the old master comes back it becomes the new backup. This is actually easier for Cushy).

The master restores from disk a copy of its old registry and over the network it fetches a copy of the registry from the backup. It now has access to all the login or proxy tickets created by the backup while it was down, and it can issue Service Tickets based on those logins.

However, the failure has left some minor issues that are not important enough to be problems. Because each server is the owner of its own tickets and registry, each has Read-Only access to the tickets of the other server. (Strictly speaking that is not true. You can temporarily change tickets in your copy of the other node's registry, but when the other node comes back up and generates its first checkpoint, whatever changes you made will be replaced by a copy of the old unmodified ticket). So the master is unaware of CAS logouts that occurred while it was down and although it can process a logout for a user that logged into the backup while it was down, it really has no way to actually delete the login ticket. Since no browser has the TGT ID in a cookie any more, nobody will actually be able to use the zombie TGT, but the ticket is going to sit around in memory until it times out.

There are a few more consequences to Single SignOut that will be explained in the next section.

A Smart Front End

A programmable front end is configured to send Validate requests to the CAS server that generated the Service Ticket, /proxy requests to the CAS server that generated the PGT, other requests of logged on users to the CAS server they logged into, and login requests based on standard load balancing or similar configurations. Each ticket has a suffix that indicates which CAS server node generated it.

  1. If the URL "path" is a validate request (/cas/validate, /cas/serviceValidate, etc.) then route to the node indicated by the suffix on the value of the ticket= parameter.
  2. If the URL is a /proxy request, route to the node indicated by the suffix of the pgt= parameter.
  3. If the request has a CASTGC cookie, then route to the node indicated by the suffix of the TGT that is the cookie's value.
  4. Otherwise, or if the node selected by 1-3 is down, choose a CAS node using whatever round robin or master-backup algorithm previously configured.

So normally all requests go to the machine that created and therefore owns the ticket, no matter what type of ticket it is. When a CAS server fails, requests for its tickets are assigned to one of the other servers. Most of the time the CAS server recognizes this as a ticket from another node and looks in the current shadow copy of that node's ticket registry.

As in the previous example, a node may not have a copy of tickets issued in the last few seconds, so one or two users may see an error.

If someone logged into the failed node needs a Service Ticket, the request is routed to any backup node which creates a Service Ticket (in its own Ticket Registry with its own node suffix which it will own) chained to the copy of the original Login Ticket in the appropriate shadow Ticket Registry. When that ticket is validated, the front end routes the request based on the suffix to this node which returns the Netid from the Login Ticket in the shadow registry.

Again, the rule that each node owns its own registry and all the tickets it created and the other nodes can't successfully change those tickets has certain consequences.

  • If you use Single SignOff, then the Login Ticket maintains a table of Services to which you have logged in so that when you logout or when your Login Ticket times out in the middle of the night then each Service gets a call from CAS on a published URL with the Service Ticket ID you used to login so the application can log you off if it has not already done so. In failover mode a backup server can issue Service Tickets for a failed nodes TGT, but it cannot successfully update the Service table in the TGT, because when the failed node comes back up it will restore the old Service table along with the old TGT.
  • If the user logs out and the Services are notified by the backup CAS server, and then the node that owned the TGT is restored along with the now undead copy of the obsolete TGT, then in the middle of the night that restored TGT will timeout and the Services will all be notified of the logoff a second time. It seems unlikely that anyone would ever write a service logout so badly that a second logoff would be a problem. Mostly it will be ignored.

You have probably guessed by now that Yale does not use Single SignOut, and if we ever enabled it we would only indicate that it is supported on a "best effort" basis.

CAS Cluster

In this document a CAS "cluster" is just a bunch of CAS server instances that are configured to know about each other. The term "cluster" does not imply that the Web servers are clustered in the sense that they share Session information. Nor does it depend on any other type of communication between machines. In fact, a CAS cluster could be created from a CAS running under Tomcat and one running under JBoss.

To the outside world, the cluster typically shares a common virtual URL simulated by the Front End device. At Yale, CAS is "https://secure.its.yale.edu/cas" to all the users and applications. The "secure.its.yale.edu" DNS name is associated with an IP address managed by the BIG-IP F5 device. It terminates the SSL, then examines requests and based on programming called iRules it forwards requests to any of the configured CAS virtual machines.

Each virtual machine has a native DNS name and URL. It is these "native" URLs that define the cluster because each CAS VM has to use the native URL to talk to another CAS VM. At Yale those URLs follow a pattern of "https://vm-foodevapp-01.web.yale.internal:8080/cas". 

Internally, Cushy configuration takes a list of URLs and generates a cluster definition with three pieces of data for each cluster member: a nodename like "vmfoodevapp01" (the first element of the DNS name with dashes removed), the URL, and the ticket suffix that identifies that node (at Yale the F5 likes the ticket suffix to be an MD5 hash of the DNS name).

 

Sticky Browser Sessions

An F5 can be configured to have "sticky" connections between a client and a server. The first time the browser connects to a service name it is assigned any available backend server. For the next few minutes, however, subsequently requests to the same service go back to whichever server the F5 assigned to handle the first request.

Intelligent routing is based on tickets that exist only after you have logged in. CAS was designed (for better or worse) to use Spring Webflow which keeps information in the Session object during the login process. For Webflow to work, one of two things must happen:

  1. The browser has to POST the Userid/Password form back to the CAS server that sent it the form (which means the front end has to use sticky sessions based on IP address or JSESSIONID value).
  2. You have to use real Web Server clustering so the Web Servers all exchange Session objects based on JSESSIONID.

Option 2 is a fairly complex process of container configuration, unless you have already solved this problem and routinely generate JBoss cluster VMs using some canned script. Sticky sessions in the front end are somewhat easier to configure and obviously they are less complicated than routing request by parsing the ticket ID string.

Yale made a minor change to the CAS Webflow to store extra data in hidden fields of the login form, and an additonal check so if the Form POSTs back to another server the other server can handle the rest of the login without requiring Session data.

What is a Ticket Registry

...

    <bean id="clusterConfiguration" class="edu.yale.its.tp.cas.util.YaleClusterConfiguration"
        p:md5Suffix="yes" >
      <property name="clusterDefinition">
           <list>
               <!-- Desktop Sandbox cluster -->
               <list>
                   <value>http<value>http://foo.yu.yale.edu:8080/cas/</value>
                   <value>http<value>http://bar.yu.yale.edu:8080/cas/</value>
               </list>
               <!-- Development cluster -->
               <list>
                   <value>http<value>https://casdev1.yale.edu:80808443/cas/</value>
                   <value>http<value>https://casdev2.yale.edu:80808443/cas/</value>
               </list>
           </list>
      </property>
    </bean>

In spring, the <value> tag generates a String, so this is what Java calls a List<List<String>> (List of List Listd of StringStrings). As noted, the top List has two elements. The first element is a list with two strings for the machines foo and bar. The second element is another List with two strings for casdev1 and casdev2.

Unfortunately, servers sometimes have more than one DNS name. So the safest way to identity a machine is to use DNS to resolve the server name to an IP address and then look for that IP address on the current machine.

Machines also have a bunch of IP addresses. First, they can have both IPV4 and IPv6 addresses. Then every machine has the loopback addresses (127.0.0.1 in IPv4), and at least one real address. But then Windows machines get Tunnel adapters, and you might have a Cisco Anyconnect VPN, and you may have virtual LAN adapters if your computer hosts VMs with VMWare Workstation or VirtualBox. If foo.yu.yale.edu is a Windows 7 desktop machine, then bar could be another computer, or it could be a VirtualBox guest running inside the foo host. That means that both foo and bar could have real network addresses, or they could have some private 192.168.x.y address that exists only inside the foo machine.

If you want to run a cluster made up of guest VMs on a machine where the DNS does not know the two host names, then you may have to add the other machine to the /etc/hosts or \windows\system32\etc\hosts file on each VM. Also if you want to do CAS development on two different desktops and have a pair of sandbox machines for testing on each desktop, then configure a single sandbox cluster and configure the VMs identically on both desktops. If you try to define separate sandbox clusters, remember that the dummy 192.168.x.y IP addresses assigned to each virtual machine have to be distinct in order to select the correct specific cluster configuration.

Restrictions:

Since selection is based on IP address, you have to use different machines or VMs. You cannot test a cluster by running two instances of CAS on different ports on the same machine.

You can be careful generating 192.168.x.y addresses so things work, but you must never use "localhost" or any other host name that resolved to the 127.0.0.1 address because that is on every machine and so every test machine will select that entry and they will all configure themselves with the same URL and same nodeName and you end up with a cluster where every machine thinks it is "localhost"There is no good way to determine all the DNS names that point to my server. However, it is relatively easy in Java to find all the IP addresses of all the LAN interfaces on the current machine. This list may be longer than you think. Each LAN adapter can have IPv4 and IPv6 addresses, and then there can be multiple real LANs and a bunch of virtual LAN adapters for VMWare or Virtualbox VMs you host or tunnels to VPN connections. Of course, there is always the loopback address.

This is a caution because what Cushy is going to do is to get all the IP addresses for the current machine and then start to lookup every server DNS name in each cluster defined in the list.  In this example, it will first look for the IP address of "foo.yu.yale.edu". It will then compare this address with all the addresses on the current machine.

Cushy cannot use a cluster that does not contain the current machine. So it continues its scan until it finds a cluster definition that the current machine is actually in, and uses the first cluster where the addresses match.

Restrictions:

You cannot create clusters that have the same IP address but different ports. Alternately, two Tomcats on the same machine cannot be members of different clusters. Cluster identity is defined by IP address, not port number. If you need to test on a single host, Virtualbox is free so use VMs.

Be careful of any generic address where the same IP address is used on different machines for different purposes. The Loopback address 127.0.0.1 is on every machine. The private network address of 192.168.1.1 may be used on many dummy networks that connect virtual machines to each other and to their host.

In a desktop sandbox or test environment, you may want to define names in the cluster definition using the local hosts file. If you don't then the computer name has to be found in the real DNS server.

Suppose you omit the clusterDefinition property entirely or the current machine is not associated with any IP address of any URL in any defined cluster. The YaleClusterConfiguration will autoconfigure the cluster. The supplied code is based on simple rules that work in the Yale environment. If you need something different, you have to change the source of YaleClusterConfiguration, but if you know any Java it is not hard. The rules for the supplied code are:

...

Then YaleClusterConfiguration will use Java to find the full hostname of the current machine, and it will find the name has a "-01" or "-02" it will configure that to be the current node and will create in the name, and it will autogenerate a cluster with one additional machine having with the same URL except name swapping the "-01" and "-02 part of the URL".

If none of the above applies:

  • there is no clusterDefinition or none of the URLs in the clusterDefinition match the current machine and
  • the HOSTNAME has no "-01" or "-02" in it.

...

The cacheDirectory is a work directory on disk to which it has read/write privileges. The default is "/var/cache/cas" which is Unix syntax but can be created as a directory structure on Windows. In this example we use the Java system property for the JBoss /data subdirectory when running CAS on JBoss.

The checkpointInterval is the time in seconds between successive full checkpoints. Between checkpoints, incremental files will be generated.

...