How are docker images built? A look into the Linux overlay file-systems and the OCI specification

Nicola Apicella
7 min readApr 21, 2020

--

It’s impossible to work with docker containers without docker images. In this post I want to talk about what makes docker images possible: the overlay filesystems.
I’ll start with a brief description of overlay filesystems. Then we will see how it applies to docker images and how docker builds an image from a dockerfile. I’ll conclude with layers cache and OCI format for container images.

As usual I’ll try to make the blog post as practical as possible.

What’s an overlay filesystems

Overlay filesystems (also called union filesystems) allow creating a union of two or more directories: a list of lower directories and an upper directory. The lower directories of the filesystem are read only, whereas the upper directory can be used for both reads and writes.

Let’s see what that means in practice by mounting one.

Create an overlay fs

Let’s create a few folders and combine them.
First I’ll create a folder called “mount” which will contain the union of all the other folders. Then I’ll create a bunch of folders called “layer-1”, “layer-2”, “layer-3”, “layer-4”. Finally a folder called “workdir” which is needed by the overlay filesystem, well, to work properly. You can call any of the folders as you wish, but calling them “layer-1”, “layer-2”, etc. will make easier to understand the parallel with docker images as we shall see.
cd /tmp && mkdir overlay-example && cd overlay-example

[2020-04-19 16:02:35] [ubuntu] [/tmp/overlay-example]  
> mkdir mount layer-1 layer-2 layer-3 layer-4 workdir
[2020-04-19 16:02:38] [ubuntu] [/tmp/overlay-example]
> ls
layer-1 layer-2 layer-3 layer-4 mount workdir

Let’s also create some files into layer-1, layer-2 and layer-3 folders.
We will leave the layer-4 (our upper folder) empty. Again, that’s not necessary, it will just make easier our parallel with docker images.

[2020-04-19 16:02:40] [ubuntu] [/tmp/overlay-example]  
> echo "Layer-1 file" > ./layer-1/some-file-in-layer-1
[2020-04-19 16:03:36] [ubuntu] [/tmp/overlay-example]
> echo "Layer-2 file" > ./layer-2/some-file-in-layer-2
[2020-04-19 16:03:53] [ubuntu] [/tmp/overlay-example]
> echo "Layer-3 file" > ./layer-3/some-file-in-layer-3

Finally, let’s mount the filesystem:

> sudo mount -t overlay overlay-example \ -o lowerdir =/tmp/overlay-example/layer-1:/tmp/overlay-example/layer-2:/tmp/overlay-example/layer-3,upperdir=/tmp/overlay-example/layer-4,workdir=/tmp/overlay-example/workdir \ /tmp/overlay-example/mount

Now let’s look inside the mount folder:

As expected the content of the folders layer-1, layer-2 and layer-3 have been mounted/combined in the mount folder.
Sure enough if we look at the content of the files, we’ll find what we have written in the previous step.

[2020-04-19 16:13:33] [ubuntu] [/tmp/overlay-example/mount] > cat some-file-in-layer-3 
Layer-3 file

Let’ try to create a file in the mount folder:

[2020-04-19 16:23:31] [ubuntu] [/tmp/overlay-example/mount]  
> echo "new content" > new-file

[2020-04-19 16:27:33] [ubuntu] [/tmp/overlay-example/mount]
> ls
new-file some-file-in-layer-1 some-file-in-layer-2 some-file-in-layer-3

Where should the new file be? In the upper layer, which in our case is the folder called “layer-4”:

[2020-04-19 16:23:49] [ubuntu] [/tmp/overlay-example]  
> tree
.
├── layer-1
│ └── some-file-in-layer-1
├── layer-2
│ └── some-file-in-layer-2
├── layer-3
│ └── some-file-in-layer-3
├── layer-4
│ └── new-file
├── mount
│ ├── new-file
│ ├── some-file-in-layer-1
│ ├── some-file-in-layer-2
│ └── some-file-in-layer-3
└── workdir
└── work [error opening dir]
7 directories, 8 files

Let’s try to delete a file:

[2020-04-19 16:27:33] [ubuntu] [/tmp/overlay-example/mount] > rm some-file-in-layer-2 
[2020-04-19 16:28:58] [ubuntu] [/tmp/overlay-example/mount] > ls
new-file some-file-in-layer-1 some-file-in-layer-3

What do you think happened to the original file in the “layer-2” folder?

[2020-04-19 16:29:57] [ubuntu] [/tmp/overlay-example]  
> tree
.
├── layer-1
│ └── some-file-in-layer-1
├── layer-2
│ └── some-file-in-layer-2
├── layer-3
│ └── some-file-in-layer-3
├── layer-4
│ ├── new-file
│ └── some-file-in-layer-2
├── mount
│ ├── new-file
│ ├── some-file-in-layer-1
│ └── some-file-in-layer-3
└── workdir
└── work [error opening dir]
7 directories, 8 files

A new file called “some-file-in-layer-2” was created in “the layer-4”. The weird thing is that the file is a character file. These kinds of files are called “whiteout” files and are how the overlay filesystem represents a file being deleted:

[2020-04-19 16:31:09] [ubuntu] [/tmp/overlay-example/layer-4]  
> ls -la
total 12
drwxr-xr-x 2 napicell domain^users 4096 Apr 19 16:28 .
drwxr-xr-x 8 napicell domain^users 4096 Apr 19 16:07 ..
-rw-r--r-- 1 napicell domain^users 12 Apr 19 16:23 new-file
c--------- 1 root root 0, 0 Apr 19 16:28 some-file-in-layer-2

Now that we have finished with it, let’s unmount the filesystem and remove the folder we created:

[2020-04-19 16:37:11] [ubuntu] [/tmp/overlay-example]  
> sudo umount /tmp/overlay-example/mount && rm -rf /tmp/overlay-example

Wrapping up the overlay filesystems

As we said at the beginning, the overlay filesystem allows to create a union of directories. In our case the union was created in the “mount” folder and it was the result of combining the “layer-{1, 2, 3, 4}” folders. Changes to files, deletion or creation will be stored in the upper dir, which in our case is “layer-4”. This is why this layer is also called “diff” layer.
Files from upper layer shadow the ones in lower layers, i.e. if you have a file with the same name and relative path in layer-1 and layer-2, the layer-2 file is going to end up in the “mount” folder.

In the next section we will see how this is used with docker images.

What’s a docker Image?

A docker image is essentially a tar file with a root file system and some metadata. You might have heard of the expression image layer and that every line in a docker file creates a new layer. For example in the following snippet we will end up with an image with three layers.

So what happens when you type “docker run”. A lot of things really, but for the purpose of this article we are only interested in the bits concerning the image.
At high level, docker downloads the tarballs for the image, it unpacks each layer into a separate directory and then tells the overlay filesystem to combine them all together together with an empty upper directory that the container will write its changes to it.
When you change, create or delete files in the container, the changes are going to be stored in this empty directory. When the container exits, docker cleans up the folder — that is why the changes you make in the container do not persist.

Layers cache

This way to use the overlay filesystem allows hosts to cache docker images effectively. For example, if you define two images, they can both use the same layers. No need to download multiple times or to have many copies on the disk!

OCI-format container images

Running a container at high level can be seen as a two steps process: building the image and running a container from the image. The popularity of docker has convinced people to standardize both steps — allowing the two pieces to evolve separately. The Open Container Initiative (OCI) is the governance which has been working with the industry to these standards.

The OCI currently contains two specifications: the Runtime Specification (runtime-spec) and the Image Specification (image-spec). The Runtime Specification outlines how to run a “filesystem bundle” that is unpacked on disk. At a high-level an OCI implementation would download an OCI Image then unpack that image into an OCI Runtime filesystem bundle. At this point the OCI Runtime Bundle would be run by an OCI Runtime.

The standardization allows other people to develop custom container builders and runtimes. For example, jessfraz/img, buildah and Skopeo are all tools that allow you to build container images without using docker. Similarly, many tools to run containers (so called container runtimes) have emerged, for example runc (used by docker) and rkt.

Other overlay filesystems

Overlay is not the only union file system that docker can use. Any file system that allows union like features and diff layer could potentially be used. For example docker can use overlay as we have seen, but also aufs, btrfs, zfs and devicemapper.

What happens when you build an image?

Let’s assume we have the following dockerfile we want to use to build an image from:

At high level, this is how docker builds an image out of it:

  1. Docker downloads the tarball for the image specified in the “FROM” and unpacks it. This is the first layer of the image.
  2. Mounts a union file system, with the lower dir being the one just downloaded. The upper dir is an empty folder
  3. Starts bash in a chroot and runs the command specified in RUN: chroot . /bin/bash -c “apt get update”
  4. When the command is over, it zips the upper layer. This is the new layer of the image we are building
  5. If the dockerfile contains other commands, repeat the process from the second step using as lower dir all the layers we have got so far. Otherwise exit.

Of course this is a simplified workflow which does not take into account different type of commands like “ENV”, “ENTRYPOINT”, etc. Those things are stored in the metadatafile which is going to be bundled together with the layers.

Conclusion

The idea of zipping a whole root file system in a tar and keeping a tar for each diff-layer turned out to be very powerful. It did not just enabled docker, but turns out to be a concept that can be used in other context as well. I guess, we will see more tools taking advantage of that in the future.

Follow me on Twitter to get new posts in your feed.
Credit for the cover image to unsplash-logo frank mckenna .

Originally published at https://dev.to on April 21, 2020.

--

--

Nicola Apicella

Sr. software dev engineer at Amazon. Golang, Java and container enthusiast. Love automation in general. Opinions are my own.