-
Notifications
You must be signed in to change notification settings - Fork 10
Docker & singularity
Docker is a containerization software that allows to run virtually any computing environment of your choosing. The computing environments are stored as images that anyone can boot up and run. Users can create their own images if there are none available with the desired features. There's a public registry of containers available at Docker Hub but companies and institutions can host their own container images. Alternatives to docker include singularity. Container software is useful if you want to:
- develop your code on your PC but in a computing environment that is typically available in a remote server (such as lxplus);
- provide a fixed environment for CI/CD tests;
- freeze analysis environment for reproducability purposes.
There are many tutorials about docker
(for example this one) that can help you get started. In general, though, these are the following steps in case you want to work with a new docker
image:
- start
dockerd
(thedocker
daemon process). For instance, in Arch Linux the commands that starts the daemon issystemctl start docker
; - pull the image with
docker pull ${IMAGE_ID}
where${IMAGE_ID}
is an image identifier in the image registry. Pull the image only if you don't have it installed on your host (check withdocker image ls
); - run the image interactively (
-it
) in abash
session withdocker run --rm ${RUN_OPTS} -it ${IMAGE_ID} /bin/bash
such that it cleans up after itself (--rm
); - optionally, end the
dockerd
process withsystemctl stop docker
if you're done working withdocker
.
Typically, we want to run a container application that has access to CVMFS, and is capable of building CMSSW and our analysis software. The list of such images is given the table below. All listed images correspond Centos7 environment. If native CVMFS support is missing in any of the images, then it is expected that the host machine has access to CVMFS.
Image | Native CVMFS support | Instance user | Dockerfile | Notes |
---|---|---|---|---|
gitlab-registry.cern.ch/ci-tools/ci-worker:cc7 |
No | root |
here | Good for CI/CD |
gitlab-registry.cern.ch/linuxsupport/cc7-base |
No | root |
N/A | Base CC7 image, more info |
clelange/cc7-cmssw-cvmfs:latest |
Yes | cmsusr |
here | Based on an older CC7 image |
ktht/ci-worker |
No | root |
here | Forked, added EOS client |
ktht/cc7-cmssw-cvmfs |
Yes | cmsusr |
here | Forked, added EOS client |
In case the CVMFS is natively supported in the image, it has to be run with higher privileges than docker
initially supplies. These privileges are added with the following --privileged
option (which is equivalent to: --cap-add SYS_ADMIN --device /dev/fuse -v /sys/fs/cgroup:/sys/fs/cgroup:ro
(?)).
By default, the container instance cannot access any directories of the host machine. In order to make a directory on the host machine accessible from the instance, one has to add
-v <path in host>:<path in instance>
to the run command.
For instance, if we want to run an image with native CVMFS and EOS support, and make a directory some_dir
in our $HOME
accessible in ~/host_dir
of the instance, we would have to run:
docker run --rm --privileged \
-v ~/Docker/some_dir:/home/cmsusr/host_dir \
-it ktht/cc7-cmssw-cvmfs /bin/bash
First and last lines are standard; the second line adds higher privileges to the docker instance; the third line makes a directory on the host machine accessible from the container instance.
Note that using ~/host_dir
in the above command would create the mounting point in /home/$USER/host_dir
where $USER
is the host user name because the expression ~
is evaluated in the shell session of the host machine.
Linking a local directory to the docker session can be useful in case you want to run DAS queries in docker. This can be achieved by passing your ~/.globus
directory (with necessary *.pem
keys) via command line.
In addition to the ~/.globus
directory, it's also beneficial to pass your ~/.ssh
directory to the docker image as well (to make repository cloning easier).
For Ubuntu users who can't (e.g in GitHub action cloud) or don't want to use the '--privileged' options, it might be necessary to use also the following option to enable docker to mount CVMFS: --security-opt apparmor=unconfined
The development of the official images is available in https://gitlab.cern.ch/cms-cloud/cmssw-docker
More information can be found in CMSSW documentation as well as on these slides. List of singularity images can be found here. Here's how to bring up a session with singularity on manivald:
singularity exec --home $HOME:/home/cmsusr \
--bind /cvmfs --bind /hdfs --bind /home \
--pwd /home/cmsusr --contain --ipc --pid \
<container> bash
One can use /cvmfs/singularity.opensciencegrid.org/kreczko/workernode:centos7
for CC7 and /cvmfs/singularity.opensciencegrid.org/cmssw/cms:rhel8
for CC8.
The above command can be wrapped into a function that takes the container name as argument.
Implementing such function in .bashrc
would make it convenient to start the singularity session without having to recall all of its parameters.
Hadoop is configured differently in CC7 image than it is in the host environment. When making native Hadoop commands, one has to always specify the address to the namenode (hdfs-nn:9000
).
This means that commands such as
hdfs dfs -ls /local/$USER
need to be turned into
hdfs dfs -fs hdfs-nn:9000 -ls /local/$USER
in this particular singularity instance. CC8 image appears do not have native Hadoop support.
Singularity is able to run docker images if they are prefixed with docker://
.
See also: https://github.com/HEP-KBFI/singularity/blob/master/README.md