Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When you finish processing the Dockerfile, the last layer generated is also called an image. In addition to its obscure hash name, it is common to create an alias (“tag”) that creates a friendly name that is easier to use, but when you build the Dockerfile again you may reassign the tag old friendly name to the new image, and then the old image remains but is only know known by its unique content hash.

You can display all the layers in an image with the “docker history” command, which shows the hash of each layer and the Dockerfile line that created it.

The bottom layer will typically be ADD a special file containing a root file system of some Linux distribution created by some vendor (Ubuntu, Debian, Alpine, etc.) and some environment variables and parameters telling Docker how to run that system. We cannot create such files, so the convention is to download a starting image already created by the vendor from Docker Hub.

...

While the “latest” image is 2b4cba85892a, it replaces an older image stored 2 months ago that had a contents has hash of d13c942271d6. The old image remains stored in the system in case it was used to build other images to run applications. When all such applications have been updated to use the latest starting image, then the old image can be deleted.

Name and Content

A Dockerfile can contain ADD and COPY statements that copy files in the project directory to some location in the image. Docker only cares about and remembers the contents (represented by a hash of the contents) and not the name of the input file. So the Dockerfile may contain the line:

copy one.txt /var/tmp

but what shows up in “docker history” is
COPY file:cd49fd6bf375bcab4d23bcc5aa3a426ac923270c7991e1f201515406c3e11e9f in /var/tmp

In general, Docker doesn’t care about the source of data in a layer. You can change the source to a different file or URL, but if it has the same content there really hasn’t been a change.

Except for the date. In the image, the file you created will have the current timestamp in the destination directory. So even if you run the same Dockerfile twice and copy the same data, generally the layer you generate will have a different hash in each run because the timestamp of the file in the directory is part of the data that goes into the image content hash.

How Specific?

Generally, the best practice for production applications is to know exactly what is in them. However, at Yale production applications are loaded onto VMs that get updated once a month from Red Hat libraries. We trust Red Hat to update its system carefully. Fifty years ago applications ran on IBM mainframes that were updated once a month by a similar process. Today applications that run on a Windows system get monthly Patch Tuesday updates and application developers don’t track bug fixes.

...