Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When you finish processing the Dockerfile, the last layer generated is also called an image. In addition to its obscure hash name, it is common to create an alias (“tag”) that creates a friendly name that is easier to use, but when . When you build a new image using the same Dockerfile again you may reassign the old friendly name same alias/tag to the new image, and then the old image remains but image. The old image remains in the Docker cache and may remain in the network image repository server, but now it is only known by its original unique content hash.

You can display all the layers in an image with the “docker history” command, which shows the hash of each layer and the Dockerfile line that created it.

The bottom layer will typically ADD a special file containing a root file system of some Linux distribution created by some vendor (Ubuntu, Debian, Alpine, etc.) and some environment variables and parameters telling Docker how to run that system. We cannot create such files, so the convention is to download a starting image already created by the vendor from Docker HubYale doesn’t build Linux images from scratch, so we must start any of our images by referencing one of the Docker Hub images that in turn includes one of these starting Linux distribution image files.

For example, the Docker Hub image named “ubuntu:latest” (on of 3/16/2022) is

...

On line 8 we learn that the special file someone got from Canonical has a SHA256 hash beginning with 8a50ad78a668527e9… and was 72.8 MB. This was turned into a Docker Hub image by adding a line telling Docker that it can run the image by starting the bash program.

The We see that the top layer has a hash beginning with 2b4cba85892a which is also the . That layer hash also becomes the unique hash identifier of the image, but to make the image easier to reference, the friendly alias tag “ubuntu:latest” is easier to remember. However, next also assigned (temporarily) to this image. Next week there may be a new “latest” updated image that has been updated and will have different contents and a new hashand the “ubuntu:latest: alias will point to the new iamge, while the 2b4cba85892a identifier will continue to point to this older image.

Code Block
>docker image ls ubuntu
REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
ubuntu       21.10     305cdd15bb4f   13 days ago    77.4MB
ubuntu       latest    2b4cba85892a   13 days ago    72.8MB
ubuntu       <none>    64c59b1065b1   2 months ago   77.4MB
ubuntu       <none>    d13c942271d6   2 months ago   72.8MB

...

A Dockerfile can contain ADD and COPY statements that copy files in the project directory to some location in the image. Docker only cares about and remembers the contents (represented by a hash of the contents) and not creates a hash of the contents of that file and remembers it. Docker does not care about the name of the input file. So the Dockerfile may contain the line:copy onefile, or its path location, or its date and attributes. If you change the Dockerfile to reference a file with a different name and location but the same contents, then because the content has has not changed, Docker regards the ADD or COPY statement as unchanged.

A Dockerfile may contain the line:

copy one.txt /var/tmp

but what shows up in “docker history” is
COPY file:cd49fd6bf375bcab4d23bcc5aa3a426ac923270c7991e1f201515406c3e11e9f in /var/tmp

In general, Docker doesn’t care about the source of data in a layer. You can change the source to a different file or URL, but if it has the same content there really hasn’t been a change.

Except for the date. In the image, the file you created will have the current timestamp in the destination directory. So even if you run the same Dockerfile twice and copy the same data, generally the layer you generate will have a different hash in each run because the timestamp of the file in the directory is part of the data that goes into the image content hash.

How Specific?

Generally, the Note that although the contents of the file have not changed, copying it to an image may give the destination file today’s date. Because the date is part of the file system, this changes the hash of the layer and of the subsequent new image.

How Specific?

The best practice for production applications is to know exactly what is in carefully control when they change and what changes are made to them. However, at Yale production applications are loaded onto VMs that get updated once a month from Red Hat libraries. We trust Red Hat to update its system carefully. Fifty years ago applications ran on IBM mainframes that were updated once a month by a similar process. Today applications that run on a Windows system get monthly Patch Tuesday updates and application developers don’t track bug fixes.

However, we have to be more careful about the version of the OS we are running (Ubuntu 20.04 or 22.04), the version of Java we are running (Java 8 or 11), and the version of components like Tomcat we are running (Tomcat 8.5, 9, or 10). Upgrades to new versions can change behavior and cause problems for applications.

Generally these principles are already baked into the standard tag names assigned to images in Docker Hub. If you look at the standard images offered that include Debian, Java, and Tomcat, you will find a page that lists all the tags given to a specific supported image. For example:

  • 9.0.60-jdk11-openjdk-bullseye, 9.0-jdk11-openjdk-bullseye, 9-jdk11-openjdk-bullseye, 9.0.60-jdk11-openjdk, 9.0-jdk11-openjdk, 9-jdk11-openjdk, 9.0.60-jdk11, 9.0-jdk11, 9-jdk11, 9.0.60, 9.0, 9

This means that if you just ask for “tomcat:9” you get the image that is specifically tomcat:9.0.60-jdk11-openjdk-bullseye (tomcat 9.0.60 on top of OpenJDK 11 running on the “bullseye” release of Debian (11.2).

If you are starting to develop a new application and know you want to use Java 11 and Tomcat 9, this is the image which is the most default for those choices (because it has aliases “9” and “9-jdk11”). However, once you put an application into production, you don’t want things to change unnecessarily, so you might use the more specific alias of “9.0.60-jdk11-openjdk-bullseye” and then change that tag only when there is a reported security problem with Tomcat 9.0.60 fixed by a later version number.

If you come back and rebuild this application after a few months, you may find that the new image has bug fixes to OpenJDK 11 and Debian bullseye. You may choose to just accept such fixes in the same way that you accept the montly RHEL or Windows maintenance applied to an application running on a VM.

Alternately, you may decide to control every single change made to a production application, but that may be an unreasonable burden.

Which Distribution?

Alpine is the leanest distribution, but the tomcat-alpine image is no longer being maintained. You can use it, but then you

However, going back 50 years to mainframe computers, it has always been necessary for system administrators to put monthly maintenance on the operating systems on which applications run. You cannot afford to run systems with known vulnerabilities because you are not ready to “change” a running production application.

Of course, if we change the application itself we must do appropriate testing. It is also necessary to test when we upgrade versions of the OS, Java, Tomcat, database, or other key components. However, if all that we do is to patch bugs in the system or libraries, then such maintenance has to be more routine.

How do we translate these considerations to the maintenance of images?

There is no simple answer. Red Hat doesn’t contribute to Docker Hub, but maintains its own OpenShift container system. Open source vendors provide maintenance, but that is not the same thing as a subscription to a production oriented subsidiary of IBM.

Everyone knows you don’t base an application on a “latest” tag. If you select “9.0-jdk11-openjdk-bullseye” as your base, you know future images will get Debian 11.2 (bullseye) and the most recent minor release of OpenJDK 11 and Tomcat 9.0. You may be implicitly upgraded from Tomcat 9.0.59 to 9.0.60, but that upgrade will fix bugs and may address security vulnerabilities.

Using a more specific tag may prevent critical patches. Using a less specific tag will eventually upgrade you to other versions of Debian, Java, or Tomcat when someone changes the defaults.

Which Distribution?

Alpine is the leanest distribution, but the tomcat-alpine image is no longer being maintained. You can use it, but then you have to put all the maintenance on it through several years and releases.

It used to be that Ubuntu was more quickly updated than Debian. However, in the last few years there are been an intense focus on reported vulnerabilites and Security patches. In response, Debian created a separate repository for fixes to reported vulnerabilities and they deploy patches to it as soon as possible. This does not, however, extend to non-security bugs where Ubuntu may still be quicker to make fixes available.

If you want a Docker Hub standard image with Java and Tomcat pre-installed, you can choose between Debian (“bullseye”) and Ubuntu (“focus” or 20.04, the most recent LTS release soon to be replaced by 22.04).

Any additional comments from tracking Docker Hub releases and their ability to patch vulnerabilities will be added here as we gather experience.

FROM Behavior

Docker has a special conservative behavior when processing an ambiguous tag in the FROM statement, which is the first statement in a Dockerfile and sets the “base image” for this build. The first time you build a Dockerfile that begins with the statement

FROM ubuntu:latest

The Docker Engine doing the build downloads the image associated with this tag from Docker Hub and associates the name “ubuntu:latest” with that image for all subsequent Dockerfile image builds until you specifically replace it. For a desktop build, you add the parameter “--pull” to a “docker build” command to tell Docker to check for a newer image currently associated with this tag name and download it and make it the new “ubuntu:latest” for subsequent builds.

Yale’s Jenkins build process does not exactly use “--pull” but accomplishes the same thing using a different technique. So when you build an image with Jenkins, you get the current image associated with a tag, and generally you want to add “--pull” to your desktop build. If it is really important to start with a very old base image, use the 12 character unique hash name to be sure you get what you wantreleases.

In previous years Ubuntu was updated more quickly than Debina. Now, however, Debian has a special source of updates for patches to security problems and makes them available immediately after the vulnerability is announced.

So now this is mostly a matter of personal choice.

FROM Behavior (--pull)

In each section of a Dockerfile, the FROM statement chooses a base image.

The work of a “docker build” command is performed in a “builder” component in the Docker Engine. By default, the builders use a special conservative behavior which saves and reuses the first image they encounter that matches an alias tag name on the first FROM they process with that alias.

Specifically, if you process a Dockerfile with a “FROM tomcat:9.0-jdk11-openjdk-bullseye”, and that tag name has not been previously encountered, then the Docker Engine will download the image that at this moment is associated with that alias, will save it in its cache, and from now on will by default reuse that image for all subsequent Dockerfiles that have the same alias in their FROM statement. Meanwhile, tomorrow Docker Hub may get a new image with the same alias but with additional security patches applied.

To avoid this problem, the Yale Jenkins jobs typically add the “--pull” parameter to their “docker build” command. This parameter tells Docker to check for a newer image associated with the alias on the FROM command and to download that newer version when one becomes available.

You probably want to use “docker build --pull” in your sandbox development.

Harbor CVEs

Harbor regularly downloads a database of reported vulnerabilities. It scans images to see if they contain any of these vulnerabilities. High severity problems need to be fixed.

...