Content Comparison

...

Harbor will regularly download a new list of security issues and will quarantine images that have serious vulnerabilities. Currently Projects are defined so that images with a High Severity problem cannot be pulled either to be run or fixed.

Base Images

An image is a sequence of layers.

Each layer is a set of changes to the next lower layer, but because the next lower layer is itself a set of changes to the layer below it, the term “layer” has to refer to the cumulative set of all layers up to some final change that is on Each line in a Dockerfile that changes something generates a layer. The content of the layer is the cumulative set of applying all changes from the bottom layer up to the top. Each layer is identified by a SHA265 unique SHA256 hash /digest of its contents which will be globally unique.At the bottom of the changes, there will of its content, although Docker normally only shows the first 12 hex characters of the hash.

When you finish processing the Dockerfile, the last layer generated is also called an image. In addition to its obscure hash name, it is common to create an alias (“tag”) that creates a friendly name that is easier to use, but when you build the Dockerfile again you may reassign the tag to the new image and then the old image remains but is only know by its hash.

You can display all the layers in an image with the “docker history” command.

The bottom layer will typically be a special file containing a root file system of some Linux distribution created by some vendor (Ubuntu, Debian, Alpine, etc.) . Nobody except vendors creates these files. In practice, they create a simple image from this special file and store it in Docker Hub. We get that version of that OS by pulling a Docker Hub image by its tagand some environment variables and parameters telling Docker how to run that system. We cannot create such files, so the convention is to download a starting image already created by the vendor from Docker Hub.

For example, the Docker Hub image named “ubuntu:latest” (on of 3/16/2022) is

Code Block

>docker pull ubuntu:latest
latest: Pulling from library/ubuntu
7c3b88808835: Pull complete
Digest: sha256:8ae9bafbb64f63a50caab98fd3a5e37b3eb837a3e0780b78e5218e63193961f9
>docker history ubuntu:latest
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
2b4cba85892a   13 days ago   /bin/sh -c #(nop)  CMD ["bash"]                 0B
<missing>      13 days ago   /bin/sh -c #(nop) ADD file:8a50ad78a668527e9…   72.8MB

On line 8 we learn that the special file someone got from Cannonical Canonical has a SHA256 hash beginning with 8a50ad78a668527e9… and was 72.8 MB. An image was created with a Dockerfile that added the line

CMD [“bash”]

The special file contains the filesystem, and the added CMD tells Docker that if the image is run, Docker should start the bash program. I do not know the internals of how the environment is set so there is a PATH to search to find the program, but obviously it must be somewhere in the special file.

With the addition of the CMD, the second and now top layer has a SHA256 hash beginning with 2b4cba85892a. The hash value of the top layer of an image is also the hash value and internal unique name of the image. In most cases, Docker only displays and needs the first 12 characters of the much longer actual hash value because 12 characters is enough to be unique most of the timeThis was turned into a Docker Hub image by adding a line telling Docker that it can run the image by starting the bash program.

The top layer has a hash beginning with 2b4cba85892a which is also the identifier of the image, but the friendly alias tag “ubuntu:latest” is easier to remember. However, next week there may be a new “latest” image that has been updated and will have different contents and a new hash.

Code Block

>docker image ls ubuntu
REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
ubuntu       21.10     305cdd15bb4f   13 days ago    77.4MB
ubuntu       latest    2b4cba85892a   13 days ago    72.8MB
ubuntu       <none>    64c59b1065b1   2 months ago   77.4MB
ubuntu       <none>    d13c942271d6   2 months ago   72.8MB

You see the same 2b4cba85892a hash value as the “Image ID” column on line 4 for the ubuntu:latest image. Lines 5 and 6 contain old images still in the Docker cache that used to be ubuntu:latest and ubuntu:21.10 before I pulled new images from Docker Hub that are currently associated with these two tags. The old images have to be kept around to support any other images that were previously built using them.

When you run Docker on your own machine, you control the Docker Engine and have layers and images in your cache that came from previous Docker commands and Dockerfile image builds you ran. When you submit a Dockerfile project to be built by Jenkins, you have almost no control over the friendly image tag names.

Now that I have pulled “ubuntu:latest” explicitly, that tag is associated in my Docker Engine and Dockerfile build environment with 2b4cba85892a. That name will remain associated with specific image until a new image is explicitly pulled, either by entering a “docker pull ubuntu:latest” or doing a “docker build --pull” on a Dockerfile with a “FROM ubuntu:latest” any time after a new image has been stored in Docker Hub and been tagged with the alias of “ubuntu:latest”.

It is clear from the example above, that ubuntu:latest is not ubuntu:21.10. The vendor has decided to associate “latest” with the latest refreshed version of the last Long Term Support, which in March, 2022 happens to be 20.04 (“focal”). This image was last refreshed March 2, so the only unique and unchanging name for this specific base image is ubuntu:focal-20220302.

Next month there will be a new LTS version of Ubuntu (“jammy” or 22.04). At that point, “latest” will switch from pointing to 20.04 to pointing to the new 22.04. However, the version number itself is not sufficient to be unique, because every few months they put maintenance on all of their versions. As you can see above, although version 21.10 was released last October, it was refreshed 13 days ago. So to have a never changing name for that version, you need to reference ubuntu:impish-20220301.

Unless you use a base OS image tag with a release and a date, you do not know what system level your application is running on. Worse, you have no reason to assume that if you build the application again the new and old image will run on the same OS version, nor will the version you used for unit testing on your desktop be the same version that is used when Jenkins builds its image and stores it in Harbor.

Since there is no explicit “docker pull” and no Jenkins option to do a “docker build --pull”, the image associated with any ambiguous tag name will be the first such image downloaded by that Docker Engine. However, if more than one Jenkins worker VM exists, then each worker will have its own Engine with its own image cache, and the version of the image in the two caches may be different.

This hasn’t mattered up to this point because in the real world, all our applications are simple Java programs that will run on any version of any Linux. We don’t really care. However, people who set standards for TEST and PROD do care about controlling the exact environment of production applications, and they have allowed this only because they do not understand how it really works.

To meet expectations, all our Dockerfiles must be changed to chain back to a specific lowest level special file representing a well defined version and maintenance level of the base OS. At least we need to make sure that the version we run our first TEST on is the version that ends up in PRODWhile the “latest” image is 2b4cba85892a, it replaces an older image stored 2 months ago that had a contents has of d13c942271d6. The old image remains stored in the system in case it was used to build other images to run applications. When all such applications have been updated to use the latest starting image, then the old image can be deleted.

How Specific?

Generally, the best practice for production applications is to know exactly what is in them. However, at Yale production applications are loaded onto VMs that get updated once a month from Red Hat libraries. We trust Red Hat to update its system carefully. Fifty years ago applications ran on IBM mainframes that were updated once a month by a similar process. Today applications that run on a Windows system get monthly Patch Tuesday updates and application developers don’t track bug fixes.

However, we have to be more careful about the version of the OS we are running (Ubuntu 20.04 or 22.04), the version of Java we are running (Java 8 or 11), and the version of components like Tomcat we are running (Tomcat 8.5, 9, or 10). Upgrades to new versions can change behavior and cause problems for applications.

Generally these principles are already baked into the standard tag names assigned to images in Docker Hub. If you look at the standard images offered that include Debian, Java, and Tomcat, you will find a page that lists all the tags given to a specific supported image. For example:

9.0.60-jdk11-openjdk-bullseye, 9.0-jdk11-openjdk-bullseye, 9-jdk11-openjdk-bullseye, 9.0.60-jdk11-openjdk, 9.0-jdk11-openjdk, 9-jdk11-openjdk, 9.0.60-jdk11, 9.0-jdk11, 9-jdk11, 9.0.60, 9.0, 9

This means that if you just ask for “tomcat:9” you get the image that is specifically tomcat:9.0.60-jdk11-openjdk-bullseye (tomcat 9.0.60 on top of OpenJDK 11 running on the “bullseye” release of Debian (11.2).

If you are starting to develop a new application and know you want to use Java 11 and Tomcat 9, this is the image which is the most default for those choices (because it has aliases “9” and “9-jdk11”). However, once you put an application into production, you don’t want things to change unnecessarily, so you might use the more specific alias of “9.0.60-jdk11-openjdk-bullseye” and then change that tag only when there is a reported security problem with Tomcat 9.0.60 fixed by a later version number.

If you come back and rebuild this application after a few months, you may find that the new image has bug fixes to OpenJDK 11 and Debian bullseye. You may choose to just accept such fixes in the same way that you accept the montly RHEL or Windows maintenance applied to an application running on a VM.

Alternately, you may decide to control every single change made to a production application, but that may be an unreasonable burden.

Which Distribution?

Alpine is the leanest distribution, but the tomcat-alpine image is no longer being maintained. You can use it, but then you have to put all the maintenance on it through several years and releases.

It used to be that Ubuntu was more quickly updated than Debian. However, in the last few years there are been an intense focus on reported vulnerabilites and Security patches. In response, Debian created a separate repository for fixes to reported vulnerabilities and they deploy patches to it as soon as possible. This does not, however, extend to non-security bugs where Ubuntu may still be quicker to make fixes available.

If you want a Docker Hub standard image with Java and Tomcat pre-installed, you can choose between Debian (“bullseye”) and Ubuntu (“focus” or 20.04, the most recent LTS release soon to be replaced by 22.04).

Any additional comments from tracking Docker Hub releases and their ability to patch vulnerabilities will be added here as we gather experience.

FROM Behavior

Docker has a special conservative behavior when processing an ambiguous tag in the FROM statement, which is the first statement in a Dockerfile and sets the “base image” for this build. The first time you build a Dockerfile that begins with the statement

FROM ubuntu:latest

The Docker Engine doing the build downloads the image associated with this tag from Docker Hub and associates the name “ubuntu:latest” with that image for all subsequent Dockerfile image builds until you specifically replace it. For a desktop build, you add the parameter “--pull” to a “docker build” command to tell Docker to check for a newer image currently associated with this tag name and download it and make it the new “ubuntu:latest” for subsequent builds.

Yale’s Jenkins build process does not exactly use “--pull” but accomplishes the same thing using a different technique. So when you build an image with Jenkins, you get the current image associated with a tag, and generally you want to add “--pull” to your desktop build. If it is really important to start with a very old base image, use the 12 character unique hash name to be sure you get what you want.

Harbor CVEs

Harbor regularly downloads a database of reported vulnerabilities. It scans images to see if they contain any of these vulnerabilities. High severity problems need to be fixed.

...

Version	Old Version 1	New Version 2
Changes made by	Howard Gilbert	Howard Gilbert
Saved on	Mar 17, 2022	Mar 18, 2022

Versions Compared

Key

Base Images

How Specific?

Which Distribution?

FROM Behavior

Harbor CVEs