Skip to content

Docker & singularity

Karl Ehatäht edited this page Dec 17, 2021 · 10 revisions

Table of contents

Docker is a containerization software that allows to run virtually any computing environment of your choosing. The computing environments are stored as images that anyone can boot up and run. Users can create their own images if there are none available with the desired features. There's a public registry of containers available at Docker Hub but companies and institutions can host their own container images. Alternatives to docker include singularity. Container software is useful if you want to:

  • develop your code on your PC but in a computing environment that is typically available in a remote server (such as lxplus);
  • provide a fixed environment for CI/CD tests;
  • freeze analysis environment for reproducability purposes.

Docker

There are many tutorials about docker (for example this one) that can help you get started. In general, though, these are the following steps in case you want to work with a new docker image:

  1. start dockerd (the docker daemon process). For instance, in Arch Linux the commands that starts the daemon is systemctl start docker;
  2. pull the image with docker pull ${IMAGE_ID} where ${IMAGE_ID} is an image identifier in the image registry. Pull the image only if you don't have it installed on your host (check with docker image ls);
  3. run the image interactively (-it) in a bash session with docker run --rm ${RUN_OPTS} -it ${IMAGE_ID} /bin/bash such that it cleans up after itself (--rm);
  4. optionally, end the dockerd process with systemctl stop docker if you're done working with docker.

Typically, we want to run a container application that has access to CVMFS, and is capable of building CMSSW and our analysis software. The list of such images is given the table below. All listed images correspond Centos7 environment. If native CVMFS support is missing in any of the images, then it is expected that the host machine has access to CVMFS.

Image Native CVMFS support Instance user Dockerfile Notes
gitlab-registry.cern.ch/ci-tools/ci-worker:cc7 No root here Good for CI/CD
gitlab-registry.cern.ch/linuxsupport/cc7-base No root N/A Base CC7 image, more info
clelange/cc7-cmssw-cvmfs:latest Yes cmsusr here Based on an older CC7 image
ktht/ci-worker No root here Forked, added EOS client
ktht/cc7-cmssw-cvmfs Yes cmsusr here Forked, added EOS client

In case the CVMFS is natively supported in the image, it has to be run with higher privileges than docker initially supplies. These privileges are added with the following --privileged option (which is equivalent to: --cap-add SYS_ADMIN --device /dev/fuse -v /sys/fs/cgroup:/sys/fs/cgroup:ro (?)).

By default, the container instance cannot access any directories of the host machine. In order to make a directory on the host machine accessible from the instance, one has to add

-v <path in host>:<path in instance>

to the run command. For instance, if we want to run an image with native CVMFS and EOS support, and make a directory some_dir in our $HOME accessible in ~/host_dir of the instance, we would have to run:

docker run --rm --privileged                 \
  -v ~/Docker/some_dir:/home/cmsusr/host_dir \
  -it ktht/cc7-cmssw-cvmfs /bin/bash

First and last lines are standard; the second line adds higher privileges to the docker instance; the third line makes a directory on the host machine accessible from the container instance. Note that using ~/host_dir in the above command would create the mounting point in /home/$USER/host_dir where $USER is the host user name because the expression ~ is evaluated in the shell session of the host machine. Linking a local directory to the docker session can be useful in case you want to run DAS queries in docker. This can be achieved by passing your ~/.globus directory (with necessary *.pem keys) via command line. In addition to the ~/.globus directory, it's also beneficial to pass your ~/.ssh directory to the docker image as well (to make repository cloning easier).

For Ubuntu users who can't (e.g in GitHub action cloud) or don't want to use the '--privileged' options, it might be necessary to use also the following option to enable docker to mount CVMFS: --security-opt apparmor=unconfined

The development of the official images is available in https://gitlab.cern.ch/cms-cloud/cmssw-docker

Singularity

More information can be found in CMSSW documentation as well as on these slides. List of singularity images can be found here. Here's how to bring up a session with singularity on manivald:

singularity exec --home $HOME:/home/cmsusr \
  --bind /cvmfs --bind /hdfs --bind /home  \
  --pwd /home/cmsusr --contain --ipc --pid \
  <container> bash

One can use /cvmfs/singularity.opensciencegrid.org/kreczko/workernode:centos7 for CC7 and /cvmfs/singularity.opensciencegrid.org/cmssw/cms:rhel8 for CC8. The above command can be wrapped into a function that takes the container name as argument. Implementing such function in .bashrc would make it convenient to start the singularity session without having to recall all of its parameters.

Hadoop is configured differently in CC7 image than it is in the host environment. When making native Hadoop commands, one has to always specify the address to the namenode (hdfs-nn:9000). This means that commands such as

hdfs dfs -ls /local/$USER

need to be turned into

hdfs dfs -fs hdfs-nn:9000 -ls /local/$USER

in this particular singularity instance. CC8 image appears do not have native Hadoop support.

Singularity is able to run docker images if they are prefixed with docker://.

See also: https://github.com/HEP-KBFI/singularity/blob/master/README.md

Clone this wiki locally