Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

So YaleClusterConfiguration adds Java logic instead of just a static cluster configuration file. During initialization on the target machine it can determine all the IP addresses assigned to the machine and the machine's primary HOSTNAME. This now allows two strategies.

First, you can configure the configurations of all your clusters (sandbox, dev, test, prod, ...). Then at run time the bean determines what machine it is on, and looks for a configuration that includes that machine. If every computer is in at most one cluster, then it will select the right configuration.If that does not work, then starting with its own HOSTNAME it can create a simple cluster configuration if the name matches a pattern. At Yale, the runtime CushyClusterConfiguration determines the IP addresses of the current machine and scans each cluster definition provided. It cannot use a cluster that does not contain the current machine, so it stops and uses the first cluster than contains a URL that references an IP address on the current server.

If none of the configured clusters contains the current machine, or if no configuration is provided, then Cushy uses the HOSTNAME and some Java code. The code was written for the Yale environment and can describe other environments, but if you already have a cluster with other machine naming conventions then you may want to modify or replace the Java at the end of this bean.

At Yale, the DEV, TEST, and PROD machines are all part of a two machine cluster where the HOSTNAME contains a "-01" or "-02" suffix. So by finding the current HOSTNAME it can say that if this machine has "-01" in its name, the other machine in the cluster is "-02" and the reverse.

...

In spring, the <value> tag generates a String, so this is what Java calls a List<List<String>> (List of Listd Lists of Strings). As noted, the top List has two elements. The first element is a list List with two strings Strings for the machines foo and bar. The second element is another List with two strings for casdev1 and casdev2.

There is no good way to determine all the DNS names that point to my server. However, it is relatively easy in Java to find all the IP addresses of all the LAN interfaces on the current machine. This list may be longer than you think. Each LAN adapter can have IPv4 and IPv6 addresses, and then there can be multiple real LANs and a bunch of virtual LAN adapters for VMWare or Virtualbox VMs you host or tunnels to VPN connections. Of course, there is always the loopback address.

This is a caution because what Cushy is going to do is to get all the IP addresses for the current machine and then start to lookup every server DNS name in each cluster defined in the list.  In this example, it will first look for the IP address of "foo.yu.yale.edu". It will then compare this address with all the addresses on the current machine.

Cushy cannot use a cluster that does not contain the current machine. So it continues its scan until it finds a cluster definition that the current machine is actually in, and uses the first cluster where the addresses match.

Restrictions:

You cannot create clusters that have the same IP address but different ports. Alternately, two Tomcats on the same machine cannot be members of different clusters. Cluster identity is defined by IP address, not port number. If you need to test on a single host, Virtualbox is free so use VMs.

Be careful of any generic address where the same IP address is used on different machines for different purposes. The Loopback address 127.0.0.1 is on every machine. The private network address of 192.168.1.1 may be used on many dummy networks that connect virtual machines to each other and to their host.

In a desktop sandbox or test environment, you may want to define names in the cluster definition using the local hosts file. If you don't then the computer name has to be found in the real DNS server.

Suppose you omit the clusterDefinition property entirely or the current machine is not associated with any IP address of any URL in any defined cluster. The YaleClusterConfiguration will autoconfigure the cluster. The supplied code is based on simple rules that work in the Yale environment. If you need something different, you have to change the source of YaleClusterConfiguration, but if you know any Java it is not hard. The rules for the supplied code are:

  1. A cluster has at most two machines, because CAS is a very simple application that uses very little resources and can be hosted on any multi-core server today. The second machine is just to recover immediately from a node failure (although you could load balance across the two machines if you want).
  2. The two machines have names that end in "-01" and "-02" (example foo-01.yu.yale.edu and foo-02.yu.yale.edu). Note that this is a three character sequence and requires the dash and the leading "0". It won't match foo1 and foo2 (although as shown in the example above you can configure such names explicitly.
  3. The name the cluster wants to use (the one that ends in "-01") is the primary HOSTNAME configured to the OS and not some additional name added to a machine that has some other primary name. This code can enumerate all the possible IP addresses, but generally there is only one HOSTNAME that the operating system (and Java) returns when you ask for it.

Then YaleClusterConfiguration will use Java to find the full hostname of the current machine, it will find the run through the list of cluster definitions. For each host name in each URL in the cluster, it will do a DNS lookup (or lookup the name in the /etc/hosts or /windows/system32/etc/hosts file). DNS can have more than one IP address for a server name. Cushy then checks each IP address returned for each server name in the list of URLs for a cluster against the list of IP addresses assigned to interfaces on the currently running machine machine. If none of the URLs in the cluster definition match any IP address on this machine, then this machine is not a member of that cluster and we can ignore that cluster and go on to the next cluster definition in the list. Cushy stops and accepts a cluster definition when one IP address for the DNS name matches one IP address assigned to any interface on the current running machine.

Restrictions:

You can certainly run more than one instance of CAS on different Web servers on different port numbers of the same machine. However, there is no J2EE API that returns all the port numbers used by the current Web server. So Cushy is blind to port numbers and has to make its decision based on IP addresses alone. Therefore, if you want to run more than one CAS server on a computer, you have to use virtual machines with one CAS server per VM, or you have to replace CusyClusterConfiguration with custom code.

Some operating systems (Windows for example) has Dynamic DNS support. If you do not uncheck the box in the Advanced section of the LAN adapter configuration, then all the IP addresses assigned to that LAN will be registered in the DNS server. If you have a real LAN that connects to the outside world, and a virtual LAN that connects to VMs you host, and you run a private 192.168.1.* network between the VMs, then unless you remembered to manually uncheck the box after creating the virtual LAN you are probably registering your 192.168.1.1 virtual IP address in the public DNS server. This is not efficient, but it could be a problem for Cushy if one of the servers in one of the cluster configurations has registered its own version of the same address. The current configuration code stops when it gets a match in any address, and 192.168.1.* addresses are very common in sandbox clusters running in VMs on the developer's desktop. So as you write the cluster configuration in XML, do a nslookup command for each hostname and verify that the addresses registered with DNS are the addresses you expect, or at least there are no obviously bogus addresses that are going to cause trouble.

Autoconfigure

At Yale the names of DEV, TEST, and PROD machines follow a predictable pattern, and CAS clusters have only two machines. So production services asked that CAS automatically configure itself based on those conventions. If you have similar conventions and any Java coding expertise you can modify the autoconfiguration logic at the end of CushyClusterConfiguration.

CAS is a relatively simple program with low resource utilization that can run on very large servers. There is no need to spread the load across multiple servers, so the only reason for clustering is error recovery. At Yale a single additional machine is regarded as providing enough recovery.

At Yale, the two servers in any cluster have a DNS name that ends in "-01" or "-02". Therefore, Cushy autoconfigure gets the HOSTNAME of the current machine, looks for a "-01" or "-02" in the name, and when it will autogenerate matches creates a cluster with with the current machine and one additional machine with the same name swapping but substituting "-01" and for "-02" or the reverse.

If none of the above applies:

...

no configured cluster matches the current machine IP addresses and the machine does not autoconfigure (because at Yale the HOSTNAME does not have "-01" or "-02"

...

Then YaleClusterConfiguration will generate a standalone CAS server with no other machines in the cluster. CushyTicketRegistry will generate checkpoint and incremental files on local disk, and will use these files to reload tickets after a CAS server reboot, but it will not communicate over the network to any other CAS server.

CushyTicketRegistry uses the cluster configuration created by the YaleClusterConfiguration bean to create one Primary CushyTicketRegistry object for the local server and then one Secondary instance of the CushyTicketRegistry class for every other node in the cluster), then Cushy configures a single standalone server with no cluster. At this point Cushy is just a very large and complicated version of the original DefaultTicketRegistry code.

Other Parameters

  <bean id="ticketRegistry" class="edu.yale.cas.ticket.registry.CushyTicketRegistry"
          p:serviceTicketIdGenerator-ref="serviceTicketUniqueIdGenerator"
          p:checkpointInterval="300"
          p:cacheDirectory=  "#{systemProperties['jboss.server.data.dir']}/cas"
          p:nodeName=        "#{clusterConfiguration.getNodeName()}"
          p:nodeNameToUrl=   "#{clusterConfiguration.getNodeNameToUrl()}"
          p:suffixToNodeName="#{clusterConfiguration.getSuffixToNodeName()}"  />

...

During normal CAS processing, the addTicket() and deleteTicket() methods lock the registry for just long enough to add an item to the end of the one of the appropriate two incremental collection. This is a fairly trivial use of locking and collections. Cushy uses locks only for very simple updates and copies so it cannot deadlock or be blocked by the other code that synchronizes on the same lockand performance should not be affected.

Quartz maintains a pool of threads independent of the threads used by JBoss or Tomcat to handle HTTP requests. Periodically a timer event is triggered, Quartz assigns a thread from the pool to handle it, the thread calls the timerDriven() method of the primary CushyTicketRegistry object, and for the purpose of this example, let us assume that it is time for a new full checkpoint.

...

At some point the front end notices the node is back and starts routing requests to it based on the node name in the suffix of CAS Cookies. The node picks up where it left off. It does not know and can not learn about any Service Tickets issued on behalf of its logged in users by other nodes during the failure. It does not know about users who logged out of CAS during the failure.

This does not appear Cushy defines its support of Single SignOff to be a problem"best effort" that does not guarantee perfect behavior across node failure.