/
DR CAS and Shibboleth

DR CAS and Shibboleth

CAS and Shibboleth are required to access most Yale services. They may be the most important Tier 0 components. However, we have seen that even when there is no DR situation, other problems can interfere with the normal functioning of CAS and Shib to disrupt our services. Therefore, this proposal is to create a fallback configuration for CAS and Shib that can be used both when there is a major disruption (data center failure) and when there is a more specific problem that disrupts only the CAS or Shibboleth service.

Principle: No common point of failure

DR CAS and Shibboleth must not share any physical or logical resource or dependency with the normal production CAS and Shib VMs. They should be in a different location, probably the AWS or Azure cloud. They will use entirely different data sources rather than relying on supposedly "redundant" servers. In particular, CAS will use either the AD Domain Controller in the AWS Cloud or the Azure AD to validate passwords, and Shibboleth will use a single recent snapshot of identity data that may only be current as of the last overnight backup.

Principle: Drop every non essential option

The DR CAS will not even attempt to force a password change for users who have not changed their password in a year, in part because the Change Your Password application is low priority for service restoration. It will still attempt to use Duo because MFA is a layer of security. DR Shibboleth may, for example, not support DEAL (change shared Eliapps mailbox permissions) again because there is also a separate server required that will not be restored in a DR environment.

Principle: Current only on Passwords

The DR configuration will be updated less frequently than Production. DR Shibboleth will be updated periodically, but it will only be guaranteed to reflect today's most recent changes after a release that changes another Tier 0 service. Therefore, new applications being configured and tested (in Huronclick, iGrad, Salesforce) may not be current in DR, but Archer, Box, ServiceNow, Google Apps, and other members of a yet to be enumerated list will always be current, and CAS will validate passwords and honor lock status that reflect the changes made at least a few hours ago.

Consequence: DR CAS and Shibboleth are a new environment (DEV, TEST, PROD, DR) and generally speaking are separately configured rather than just being a clone of the production environment. This means that they periodically have to be brought up and tested to make sure they are working.

Principle: There may be more than one DR machine, but each single machine stands alone

DR is not required to have a cluster, or load balancing, or non-disruptive service across maintenance. There is one machine and it has enough memory and CPUs to handle the load. However, we may decide to create instances of the DR configuration in more than one place (AWS and Azure) and on more than one platform (Linux, Windows, Container) recognizing that we can only choose one of the options to bring up in DR mode at any time.

Principle: Someone else has to configure the network

Someone has to decide how requests for "https://secure.its.yale.edu/cas" are routed to the DR machine. This could be done with a DNS change to the public name, or a change to the routing of the IP address 130.132.35.49, or a change the the F5, or a change to the routing between the F5 and the VM of IP address 172.28.51.119 (vm-ssoprdapp-01.web.yale.internal). This is a critical decision, but it is outside the scope of this planning document. Note that any solution that bypasses the normal use of the F5 has to install SSL certificates someplace else. This includes "secure.its.yale.edu", "auth.yale.edu", and the SSL LDAP access to AD (presumably the AWS Domain Controller).

Issue: How to get secrets at startup

DR CAS only needs a completely unauthorized service account in AD simply because AD does not allow unauthenticated connections. The service account needs no privileges.

DR Shibboleth needs the private key used to sign responses (and more generally the set of files in in the /apps/shibboleth-idp/credentials directory). If it only uses a local database on the same VM restored from last night's backup of identity information, then it does not require database usernames or passwords. It also needs an unprivileged service account to access AD to get Group membership.

Information Security needs to approve some mechanism to feed keys and passwords to DR applications when they are started.

Information Security needs to approve a mechanism to store non-sensitive (biographical) user directory information (firstname, lastname, email, affiliation, school code, cost center)  that is not HIPAA but is FERPA in some place like the AWS S3 servers so they can be restored for use by DR Shibboleth. If FERPA is a problem, we could exclude anyone with privacy from logging in during DR. If the data is encrypted, then the decryption key is a secret to be provided at startup.