Background

There has been a lot of confusion, and a lot of us talking past each other regarding update set management, and how the platform limitations with regard to update set management apply to a 'best practices' SDLC process. Many team members have worked with other projects on platforms where rolling back an environment to a previous known state is trivial and quick.

For example, a code base managed by SVN has a versioning process, where it's easy and fast to apply a particular revision of code to a TEST environment, and you have successfully rolled back to a place before the delta to the code that caused a particular defect.

ServiceNow primitives for managing update sets

Update Sets are the molecules of ServiceNow release management. They contain one or more Customer Updates, which are the atoms of release management. Customer Updates are the smallest, indivisible unit for release management.

Recommended ServiceNow best practice (source wiki) is that Update sets are kept 'fairly small'. This is done so that it's easy to isolate changes to a particular context, and it back out a particular incorrect Customer Update from an otherwise correct Update Set.

Our own Yale processes is that we name update sets for the Date they were created, with the name of the Author, with a description that contains the Requirement number, as well as Defect number if applicable, along with a description of the work contained in the update set.

When an Update Set is In Progress, changes to the platform by that Author are captured in that Update Set. Unit testing is performed in place by the Author, and when the work is completed, the Update Set is set to state Complete.

Sometimes an Update Set is fully developed, and set to state Complete, but for certain reasons the code should be abandoned. An example would be the customer replaced their process or product and will never need the code that was developed. In this case, we would set the Update Set to Ignore. Update Sets with Ignore status are ineligible to be promoted to higher environments.

For reasons that should soon become clear, a Closed Update Set should never be re-opened and modified. Instead, if the code is incomplete or otherwise needs to be modified, a new Update Set should be completed that modifies whatever elements of the first Update Set that need to be corrected.

Merging update sets

Recall that Update Sets contain one or more Customer Updates. It is possible to merge related Update Sets. The merged Update Set contains all the Customer Updates previously contained in the individual Update Sets that were merged. The previous Update Sets still remain listed, but they contain no Customer Updates, and they are basically useless at that point. Official best practice is that after a Merge, the Update Sets that are now empty should be set to Ignore. There is no such thing as an Undo Merge operation. If a mistake is found after a merge, we would need to create a new Update Set in DEV, fix the issue, and when the issue is known resolved, merge that Update Set with the other Update Set.

Because in the earlier platforms of ServiceNow the Merge feature wasn't always perfect, Yale has not Merged Update Sets in the past. We have heard that Dublin resolves the previous issues, although there is still no Undo Merge feature. Allen from ServiceNow has suggested that Yale consider using Merge, and that we Merge in TEST environment, after testing is complete. When Allen gave that advice, he was under the impression that we were performing UAT in the TEST environment, so we should ask the question again after fully clarifying the way in which the meaning of the TEST environment has changed.

Advancing code from DEV to higher environments

When an Update Set is Complete, it becomes eligible to be promoted (deployed) to other environments. Update Sets in state Ignore or In Progress are not eligible to be promoted.

A person with Administrator access in the higher environment will run the "Update Source" feature. Then the feature "Retrieve Completed Update Sets" actually syncs the completed code repository with the local copy. This step does not actually commit the code to the higher environment, it just provides a local copy so that future code commits are using that local copy.

For this reason, and other reasons, we never re-open a Closed Update Set. Once an Update Set has been Closed, it becomes eligible for promotion to higher environments, and if we change the meaning of a previously closed Update Set, we may create a release management nightmare, where an Update Set in one environment doesn't do the same thing as the same-name update set in a different environment.

Previewing an Update Set

After new Update Sets have been loaded to a higher environment, those Update Sets are viewable in the Retrieved Update Sets view. All new Update Sets, and possibly other Update Sets previously loaded, will be in state Loaded. Using the release management process it should be clear to the Administrator which code should be applied, and in which apply order.

When applying multiple pieces of code to a single configuration element in ServiceNow, whichever code is applied most recently is the code that 'wins'. This means that apply order is an essential consideration in release planning. This is also why it is proper to often leave code in place when a defect is found, and apply a new Update Set which applies a fix to just the broken portion of code. In these considerations, we would have at least two pieces of code that applied to the particular defective element. We want the fix to 'win', so we need to be certain that the apply order is recorded and followed. We ensure this apply order through our Update Set naming convention, which is date-based. We also apply order through are Description conventions, where we apply a Requirement number (and a Defect number in the case of defects) as well as a description of the change made in the Update Set.

To Preview an Update Set, choose the Update Set from the Retrieved Update Sets view, and then click "Preview Update Set". A series of screens will indicate progress, and there will then be a screen to review the status of the Update. This feature is called "View Update Set Preview list" Often times a report will indicate that the Update Set contained change elements that modify an existing element, and you will be asked to confirm the change.

Confirm any conflicts

On the View Update Set Preview list view, there is a Proposed Action field. The ideal scenario is 'Commit'. Sometimes during a conflict with a particular element, the platform will suggest 'Ignore'. Ordinarily this is a mistake, and we should review the conflict, confirm we wish to move forward, and modify the Proposed Action for that conflict to 'Commit'. We should never advance an update set partially by leaving some conflicts to 'Ignore'. If the Admin tries to do so, the system will stop the advance, and warn the Admin. The conflicts should be regarded as a "Yellow light" rather than a "Red light", even though they are colored red. The system tries to stop you from shooting yourself in the foot, but this feature is more annoyance than helpful.

Commit the update set

After fixing up the conflicts, the Preview list Proposed Action should be entirely set to Commit. Go back to the Retrieved Update Sets view, choose the same Update Set, which is now in state Previewed. Commit the Update Set with the "Commit Update Set" feature. The Admin will see a series of progress screens. If the Admin returns to the Retrieved Update Sets view, that Update Set will now be in state Committed.

Backing out the update set

If we return to the Retrieved Update Sets view, we will notice the Update Set will now be in state Committed. Then we realize we applied the wrong update set. We wish to Back Out.

To Back Out, we need to change to the Local Update Sets view, and sort by the field Created, ordered by most recent.

Only the most recently-created update set has the Back Out feature. This is a platform limitation. We do not have the ability to Back Out just one particular thing that was applied thirteen or an arbitrary number of update sets ago.

It is possible to chain the Back Out feature, and back out multiple update sets, but doing so gets very tricky very fast. To do so, you most follow a last in-first out process, where to back out the thirteenth-ago update set, you first need to back out the twelve that were applied more recently, than you need to reapply those twelve in the appropriate order. Recall that apply order is essential to NOT reintroducing defects that you have already solved with later update sets.

In the process of backing out several Update Sets, you are likely to break other requirements and features that are wholly unrelated to the issue you are trying to solve. If you tried to chain Back Outs in Production, you would be taking certain features offline, or reintroducing bugs. If you tried to chain Back Outs in TEST, you disrupt the testing process, and testing can not resume until the disruption is complete. And again, chaining Back Outs is a very delicate operation, and things that get disrupted need to get reapplied in exactly the same order in which they were originally applied, or we may introduce a new defect. In short, trying to chain Back Outs should be an item of absolute last resort, as there is almost aways a less disruptive way to accomplish your goal.

Alternate ways to alter/fix/disable/eliminate code than Back Out

If Back Out should be a process of last resort, what should we do instead? It depends upon our goals. And this is where I think a lot of people were confused by Allen's advice, and thought he was contradicting himself. The best course of action depends on why we don't like the code.

This code is pretty good, but there is a not-very-bad defect

In this case, a previous approach Yale took was that we documented the defect in HPALM, applying a defect number, and generating screenshots or any other steps needed to demonstrate the issue. Then we would leave the defective code in place in TEST, providing the defect was limited and would not interrupt testing or possibly cause issues with other testing.

The root cause would be isolated, the fix would be generated in a new Update Set in DEV. The fix would be unit tested in DEV, and then advanced to TEST using the process illustrated above (set Update Set Complete, Update Source, Preview, Resolve Conflicts, Commit). Then tester(s) would review the fix and either mark the defect resolved in HPALM, file a new defect, or say that the fix does not resolve the existing defect. The process may be repeated a second time if the defect still only affects the feature in question and does not have impact on the remaining system.

Benefits of this approach: testing is not interrupted, testers can continue testing, and developers can focus on fix.

I recommend this approach for R812. R812 has a minor defect, (D1031) caught in UAT. D1031 has a known root cause, with a fix unit-tested in DEV. I recommend we apply the D1031 fix to TEST, and if approved, that R812 and the D1031 fix launch on this release.

Root cause identified, and it's one portion of one requirement. We want to keep fixing the requirement until we get things right.

This code is pretty good, but there is a defect so bad that it affects other testing

In this case, a defect has been found that affects multiple modules, and testing cannot continue. The testers should file a defect and immediately notify the development team of the issue. Development team needs to begin Root Cause Analysis. Testing has stopped anyway, and we are less reticent to cause disruption, and more concerned about fully eradicating the issue. There are multiple approaches to solving the problem, and they depend on the results of the root cause analysis.

Root cause identified, and it's one portion of one requirement. We want to keep fixing the requirement until we get things right.

In this situation, there are two steps. Step one is to disable the portion of the requirement that is causing the defect. This would mean that we would also likely remove the feature from that code, but it ends the disruption to testing, and allows testing to resume. If the root cause was a bad client script, we could create an update set in DEV that disables the bad client script, advance that update set to TEST, and have testers confirm that the TEST system is back to normal and testing can resume on other items.

Step two is to create a new version of the broken feature in DEV, unit test for meeting the original functionality, and test that it does not reintroduce the defect, and then coordinate with testing team about appropriate time to introduce the feature fix update set to TEST. If this happens to PRE-PROD, the fix will always be the most recent code, so when the code fix is ready and approved by testers, it can be tacked on to the end of the release in PRE-PROD, no clone needed, and added to set of updates in the release.

This issue happened in December 2013 or so. We had a Business Rule that broke other functionality in the system. We got TEST working by advancing a small Update Set that disabled the bad code so testing could continue while a fix was prepared. We fixed in DEV, promoted the revised non-broken feature to TEST, and released the requirement and fix on that release cycle.

Root cause identified, and requirement is going to get yanked from the release

Sometimes we determine that a requirement is so broken that it cannot be fixed in time for this release, or for other reasons it is being yanked from the release. In TEST, the appropriate action would be to disable the code. If the requirement was a Service Catalog Item, we can toggle the Catalog Item to non-Active, and the Item will disappear from the Service Catalog. If the requirement only included a Catalog Item, this is sufficient for TEST. If there were Business Rules, or database changes, or other more complex changes, it may be necessary to clone over TEST to fully eradicate the code and perform a known-clean basis to reapply the release, minus the items that will no longer be part of the release.

In PREPROD, if we have to actually remove code, there are again two approaches:

There is nothing wrong with the code, we just don't want it available to the customers yet

In this situation, the most appropriate fix is to leave the code in the release, and apply just a small update set to disable the code. In the case of a Service Catalog Item, an Update Set that toggles the Catalog Item to inActive is sufficient. When we are ready, we can have a future release where we have a small Update Set that toggles the Catalog Item back to Active.

This is my recommendation for R780. The code has no outstanding defects, and successfully passed UAT back in April. It was removed from the release because the stakeholder did not participate in the second round of UAT. This has two positive affects: we do have to alter the release notes, but the release itself still gets the code into PROD, and we don't worsen our completed code development backlog. We also don't have to take the duplicate work by yanking the code out of the system, only to put it right back in a later release.

We have done this with much of START Replacement. The code was fully UATed by stakeholders, it met formal requirements, and passed testing. It was determined that no START Replacement Catalog Items would be made available to end-users until Trainers develop training materials for the end-users. When training materials are available we can have a release with small update sets that toggle the Catalog Items to Active.

There is nothing wrong with the code, we have changed our mind about releasing it, and it probably won't be released in the future

This is extremely rare, but it has happened. We had a situation where we fully developed a Catalog Item, got to UAT, and the customer's process had changed so much that the code needed to be scrapped and the process started over again. In that situation, we need to do several steps:

Revise release notes to remove requirement.
Revise release plan to remove requirement and associated update sets
Go to DEV, and mark all associated update sets as Ignore
Go to PROD, go to 'Retrieved Update Sets', and delete all associated update sets. We don't want even the chance of accidentally deploying this code to PROD, but it may have gotten copied to PROD back when the update sets were set to Complete. Alternatively, it's also safe in PROD to delete all Retrieved Update Sets, and the next 'Retrieve Completed Update Sets' will not pull the associated update sets that were set to Ignore
If this code was already committed to PRE-PROD, we need to clone over PRE-PROD from PROD, and re-deploy the release according to the revised release plan. This is mainly because PRE-PROD is a test of our deployment plan, and we need to do the deployment the same in PRE-PROD and PROD.
If this code was already committed to TEST, there's a decision to make. A) If the code can safely be disabled in TEST, we can create one update set in DEV, advance it to TEST, commit it to TEST to disable the code, then set the update set as Ignore in DEV. This will allow other testing to continue in TEST, and TEST will be cloned over after the release, so it's safe to leave the disabled code for the next few days before the clone resets TEST. B) If the code cannot be safely disabled in TEST, TEST will need to be cloned over, and the release re-applied. This will cause disruption in TEST, and should only be performed if A is not possible.
Mark the requirement in HPALM with appropriate status, and include comments about why the code was completed and then abandoned.

R1147 is a requirement to resolve issues with customers adding comments to an RITM, and the fulfillment people performing the SCTASK never seeing those comments. The requirement is defect-free, and passed UAT twice. The original fix was envisioned as applying to just one particularly problematic set of customers, but Soren saw that it had benefits for the entire system. The team consensus seems to be that we want to remove this from the release and never apply this fix in the future. Because of this change to the release, and because of other changes to the release, I recommend cloning over PRE-PROD to remove R1147 from the release originally scheduled for April. R1147 can be safely disabled in TEST using the 6A method, and I recommend doing so.

WIDS method for update set management

Background

ServiceNow primitives for managing update sets

Merging update sets

Advancing code from DEV to higher environments

Previewing an Update Set

Confirm any conflicts

Commit the update set

Backing out the update set

Alternate ways to alter/fix/disable/eliminate code than Back Out

This code is pretty good, but there is a not-very-bad defect

Root cause identified, and it's one portion of one requirement. We want to keep fixing the requirement until we get things right.

This code is pretty good, but there is a defect so bad that it affects other testing

Root cause identified, and it's one portion of one requirement. We want to keep fixing the requirement until we get things right.

Root cause identified, and requirement is going to get yanked from the release

There is nothing wrong with the code, we just don't want it available to the customers yet

There is nothing wrong with the code, we have changed our mind about releasing it, and it probably won't be released in the future