Skip to content
Caetano Melone edited this page Mar 21, 2020 · 47 revisions

Welcome to the POWER9-HPC wiki!

For those with experience setting up Intel clusters, using the OpenHPC repository and installing packages is a breeze. However, if you would like to use one of IBM's new POWER9 systems at your lab or company, the steps are a bit more complicated but the end result is fantastic performance.

In this article, I will cover everything from installing the OS to configuring open source software that is extremely useful in getting your HPC up and running smoothly.

Table of Contents

  1. Introduction
  2. Installing CentOS
  3. Configuring the BMC
  4. Installing xCAT
  5. Configuring xCAT Environment
  6. Installing the Slurm Job Scheduler
  7. Adding Infiniband Support
  8. C/C++ Compilers
  9. CUDA Support

Introduction

Our friends over at IBM lent the Stanford High Performance Computing Center a couple of incredible machines to test out. The master node (also functioning as a login node) is an IBM LC921 system while the AC921 will be our compute node.

While the systems came preinstalled with RHEL, at the HPCC we use open source software whenever possible, so we will be using CentOS 7 for this cluster.

As I went through the first tests with the LC921, I discovered that the CentOS 7.7 ISO provided through the official channels was not compatible with the machine. Thus, we will be using the 7.6 ISO to start, the OS can be updated to the latest version of CentOS 7 once the initial steps to install the OS are taken.

Before we do anything, make sure that both your master and compute nodes are plugged in and ready to go!

Installing CentOS

The correct ISO can be installed from the Linux Kernel Archives. Navigate to this directory and download CentOS-7-power9-Everything-1810.iso onto your local machine.

Once the download is complete, burn it to a USB.

Ensure that the LC922 is powered off and plug the USB into an empty port. Start the machine, wait a few minutes, and Petitboot (the bootloader) will prompt you to exit the boot process. Proceed with this action, as we want to modify some boot parameters.

At the top of Petitboot, all the available boot options are listed. Using your navigation keys, scroll to the USB that you burned CentOS to. Once it is highlighted, hit e on your keyboard. Petitboot will enter a new interface, allowing you to edit the boot parameters.

At the top of your screen, information about the USB is displayed. Make sure to write the UUID of the device (in 0000-00-00-00-00-00 format) down for safekeeping. Navigate to the field where the boot parameters can be edited, and clear it. Now, replace it with the following (replace the UUID with the one for your USB).

ro inst.stage2=hd:UUID=0000-00-00-00-00-00 inst.graphical console=tty0 console=hvc0

Once the new boot parameters are entered correctly, save the information and return back to the Petitboot home page. Boot into the USB by navigating to its location on the listing and pressing Enter.

The machine will boot into the CentOS installer. Select the appropriate timezone and language. Under Installation Destination, click "I would like to make additional space available." Click "Done", and you are brought to a new screen. Make sure that the disk that you would like to install CentOS to is selected for installation. Click "Delete All" on the botton right, then click "Reclaim Space."

Under Software Selection, select GNOME Desktop on the left, and GNOME Applications, Compatibility Libraries, Development Tools, and Security Tools on the right. Of course if you would like to install additional software, feel free, the provided examples are the bare minimum necessary to get up and running.

If you would like to configure a network connection (local or internet), plug in the respective cables and you can configure the options under Network and Hostname. Be sure to choose the interfaces wisely, as these will be used during the node deployment process.

After all your settings are set the way you like, click "Begin Installation." The CentOS installer will prompt you to set a root password as well as credentials for one user. Make sure that the user is categorized as an administrator.

The installation process will take between 15 and 30 minutes. The computer will reboot and you may unplug the USB from the machine. Depending on how the boot order is configured in Petitboot, you may need to change the top of the device sequence to match the hard drive you installed CentOS on.

After the LC922 successfully boots up, you should be able to log in with the user you created.

Configuring the BMC

The Base Management Controller, or BMC, is an integral part of controlling compute nodes remotely. You can restart, and power on/off the system from anywhere. For this example, I will be using the AC922 system.

Before getting started, be sure to identify the BMC interface on the compute node and plug an ethernet cable into the master node.

Start the compute node, and following the initial boot process, it will enter Petitboot. Stop the boot there and enter the Petitboot console.

For the compute node, I'll be using the following configuration:

  • IP Address: 10.1.1.11 (known as COMPUTE_BMC_IP in the instructions down below)
  • Netmask: 255.0.0.0
  • Gateway: 10.1.1.1

Once you are in the console, run the following commands to set the static IP for the BMC interface:

ipmitool lan set 1 ipsrc static
ipmitool lan set 1 ipaddr 10.1.1.11
ipmitool lan set 1 netmask 255.0.0.0
ipmitool lan set 1 defgw ipaddr 10.1.1.1
ipmitool mc reset cold

After running the last command, the system will hang for a couple minutes as the process reinitializes. Once that is over, run ipmitool lan print 1 to confirm the changes.

Installing xCAT

xCAT is software that allows you to manage the stateless deployment of nodes with ease. [Describe it more and say why it's important].

For the next commands, I will be using the following shortcuts to represent hostnames, IP addresses, and other information:

To get started with the installation process, log onto your new master node and run the following commands.

echo "MASTER_IP MASTER_HOSTNAME" >> /etc/hosts"
echo "LOCAL_IP LOCAL_HOSTNAME.localdomain LOCAL_HOSTNAME >> /etc/hosts"

xCAT relies on network protocols that your default firewall rules may prevent from working correctly. The firewall service on the master node must be disabled.

systemctl disable firewall
systemctl stop firewalld
reboot

Once your machine reboot and you can verify that the firewall is disabled by running sestatus, xCAT can be installed.

First, add the EPEL repository:

yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Now, using a script made by xCAT's creators, install the software:

wget https://raw.githubusercontent.com/xcat2/xcat-core/master/xCAT-server/share/xcat/tools/go-xcat -O - >/tmp/go-xcat
chmod +x /tmp/go-xcat

Now, run the script:

/tmp/go-xcat install

Configuring xCAT Environment

Your HPC system will need to have an accurate clock throughout all nodes; enabling NTP on your machine and selecting an appropriate time server is an essential start of the process. If your institution has their own time server, I would recommend using that one, otherwise, using time.nist.gov is a safe bet.

systemctl enable ntpd.service
echo "server NTP_SERVER" >> /etc/ntp.conf"
echo systemctl restart ntpd

Now, we will start configuring xCAT for node provisioning.

Register the local network interface for DHCP:

chdef -t site dhcpinterfaces="xcatmn|MASTER_LOCAL_INTERFACE"

Transfer the CentOS-7-power9-Everything-1810.iso file that you installed on the computer to the master node. Generate the initial operating system image using this command:

copycds /root/CentOS-7-power9-Everything-1810.iso

Of course, the path that follows copycds will depend on where you copied the ISO file to.

To verify that the base image has been generated, run

lsdef -t osimage

and the output should resemble the following:

centos7.6-ppc64le-install-compute  (osimage)
centos7.6-ppc64le-netboot-compute  (osimage)
centos7.6-ppc64le-statelite-compute  (osimage)

Because xCAT utilizes a chroot environment to facilitate the modification of the operating systems, it will be useful to store the location of this environment on the master node.

Add the following to ~/.bashrc or the equivalent file, and modify it depending on the operating system and architecture.

export CHROOT=/install/netboot/centos7.6/ppc64le/compute/rootimg/

To generate the chroot environment inside the respective folder, run

genimage centos7.6-ppc64le-netboot-compute

Now, enable the Yum package manager for the chroot environment:

yum-config-manager --installroot=$CHROOT --enable base 
cp /etc/yum.repos.d/epel.repo $CHROOT/etc/yum.repos.d

Install packages that will be necessary on the compute node:

yum -y --installroot=$CHROOT install ntp kernel ipmitool

Enable the time server on the compute node:

chroot $CHROOT systemctl enable ntpd 
echo "server MASTER_INTERNAL_IP" >> $CHROOT/etc/ntp.conf

There are a number of files that you may want to sync from the master to the compute node. xCAT handles this by reading from a synclist. We are going to place this file in /install/custom/netboot.

mkdir -p /install/custom/netboot
chdef -t osimage -o  centos7.6-ppc64le-netboot-compute synclists="/install/custom/netboot/compute.synclist" 
echo "/etc/passwd -> /etc/passwd" >> /install/custom/netboot/compute.synclist 
echo "/etc/group -> /etc/group" >> /install/custom/netboot/compute.synclist 
echo "/etc/shadow -> /etc/shadow" >> /install/custom/netboot/compute.synclist 

Once this process is completed, the chroot environment can be packed into an image:

packimage centos7.6-ppc64le-netboot-compute

Now, depending on how many compute nodes you would like to configure, the following command will vary.

mkdef -t node COMPUTE_NODE_NAME groups=compute,all ip=COMPUTE_LOCAL_IP mac=COMPUTE_LOCAL_MAC netboot=petitboot \ arch=ppc64le bmc=COMPUTE_BMC_IP bmcpassword=0penBmc \ mgt=ipmi serialport=0 serialspeed=115200

Before running this command, make sure to identify the local interface you would like to use on the compute node, as having the correct MAC address will matter as you set it up. In addition, having the BMC configured correctly is important as you can control the node's power from the master.

Add the master node's hostname to the xCAT database for network-wide name resolution.

chdef -t site domain=MASTER_HOSTNAME

Now, the following commands will configure the network records in preparation for the first boot of the compute node:

makehosts
makenetworks
makedhcp -n
makedns -n
systemctl enable dhcpd.service
systemctl start dhcpd

To match the node definition created with mkdef with the chroot environment, run:

nodeset compute osimage=centos7.6-ppc64le-netboot-compute

Install IPMItool in order to control the compute node's power:

yum install ipmitool

After successfully running those commands, run rpower compute reset to reboot the compute node and boot into the new operating system.

Installing the Slurm Job Scheduler

Now that we have a simple operating system running, it's time to install software that will allow us to run programs on the cluster. First, let's install Slurm, an open source job scheduler that is incredibly useful when submitting jobs throughout a cluster.

First, install the MariaDB database engine to store the accounting information for Slurm.

yum install mariadb-server mariadb-devel

Now you can create the users that Slurm, and a dependency (Munge, which handles authentication for submitted jobs).

export MUNGEUSER=1001
groupadd -g $MUNGEUSER munge
useradd  -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge  -s /sbin/nologin munge

export SLURMUSER=1002
groupadd -g $SLURMUSER slurm
useradd  -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm  -s /bin/bash slurm

Depending on how GIDs are allocated on the system, you may need to modify MUNGEUSER and SLURMUSER. Check which GIDs are being currently used, run getent group.

Install Munge on both the master and chroot environment:

yum install munge munge-libs munge-devel
yum install --installroot=$CHROOT munge munge-libs munge-devel

Now we can start configuring the Munge set up. Create a secret key on the master node using rng-tools:

yum install rng-tools
rngd -r /dev/urandom

/usr/sbin/create-munge-key -r
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
chown munge: /etc/munge/munge.key
chmod 400 /etc/munge/munge.key

We can add this secret key to the synclist we created earlier:

echo "/etc/munge/munge.key -> /etc/munge/munge.key" >> /install/custom/netboot/compute.synclist 

From the chroot environment, we can fix the permissions for the Munge files:

** ON CHROOT **

chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/
chmod 0700 /etc/munge/ /var/log/munge/ /var/lib/munge

Enable the Munge service on start:

chroot $CHROOT systemctl enable munge
systemctl enable munge
systemctl start munge

After these steps, Munge should be successfully installed, and we can now proceed with the installation of Slurm on the system.

Install Slurm dependencies:

yum install openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel man2html libibmad libibumad

yum install --installroot=$CHROOT openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel man2html libibmad libibumad

Because Slurm needs to be installed from source, it makes sense to create a temporary directory to store all the files:

mkdir -p /tmp/slurm

The latest version of the Slurm source code can be found on this page. For this guide, we will be using version 20.02.

cd /tpm/slurm
wget https://download.schedmd.com/slurm/slurm-20.02.0.tar.bz2

Install the rpm-build package, which will allow for easier compilation of the bz2 file. There are a few additional packages that will facilitate the process.

yum install rpm-build
yum groupinstall "Development Tools"
yum install python36 perl-ExtUtils-MakeMaker

Now, compile Slurm:

rpmbuild -ta slurm-20.02.0.tar.bz2 slurm-20.02.0.tar.bz2

This should take a few minutes, and once it's done compiling, change into this directory: cd /root/rpmbuild/RPMS/ppc64le/. In here, you will find all the Slurm rpms that can be used for install with yum.

Install Slurm onto the master node:

yum install ./slurm-20.02.0-1.el7.ppc64le.rpm slurm-devel-20.02.0-1.el7.ppc64le.rpm slurm-perlapi-20.02.0-1.el7.ppc64le.rpm slurm-torque-20.02.0-1.el7.ppc64le.rpm slurm-slurmdbd-20.02.0-1.el7.ppc64le.rpm slurm-slurmctld-20.02.0-1.el7.ppc64le.rpm

Install Slurm onto the compute node (via chroot environment):

yum --installroot=$CHROOT install ./slurm-20.02.0-1.el7.ppc64le.rpm slurm-devel-20.02.0-1.el7.ppc64le.rpm slurm-perlapi-20.02.0-1.el7.ppc64le.rpm slurm-torque-20.02.0-1.el7.ppc64le.rpm slurm-slurmdbd-20.02.0-1.el7.ppc64le.rpm slurm-slurmd-20.02.0-1.el7.ppc64le.rpm

After installing Slurm, you will need to configure its settings to match your environment. Luckily, the developers of the project have set up a website that allows you to generate a simple config file. For posterity, here is the one we are using at HPCC:

SlurmctldHost=hpcc-power9

MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmd
SwitchType=switch/none
TaskPlugin=task/none

SchedulerType=sched/backfill
SelectType=select/linear

AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobAcctGatherType=jobacct_gather/none

SlurmctldLogFile=/var/log/slurmctld.log
SlurmdLogFile=/var/log/slurmd.log

# COMPUTE NODES
NodeName=power9-compute-1 CPUs=128 Sockets=2 CoresPerSocket=16 ThreadsPerCore=4 State=UNKNOWN
PartitionName=main Nodes=power9-compute-1 Default=YES MaxTime=INFINITE State=UP

When determining info such as sockets, CPUs, etc, you can run lscpu on the compute node to find it.

Save the file to /etc/slurm/slurm.conf (create directory if it doesn't exist), and add it to the synclist:

echo "/etc/slurm/slurm.conf -> /etc/slurm/slurm.conf" >> /install/custom/netboot/compute.synclist 

Configure permissions for Slurm log files on the master node:

mkdir /var/spool/slurmctld
chown slurm: /var/spool/slurmctld
chmod 755 /var/spool/slurmctld
touch /var/log/slurmctld.log
chown slurm: /var/log/slurmctld.log
touch /var/log/slurm_jobacct.log /var/log/slurm_jobcomp.log
chown slurm: /var/log/slurm_jobacct.log /var/log/slurm_jobcomp.log

** ON CHROOT **

Now for the chroot environment:

mkdir /var/spool/slurmd
chown slurm: /var/spool/slurmd
chmod 755 /var/spool/slurmd
touch /var/log/slurmd.log
chown slurm: /var/log/slurmd.log

Apply the changes to the compute image, and restart the server:

packimage centos7.6-ppc64le-netboot-compute
rpower compute reset

Once the node is up and running again, you can verify that the node has been successfully configured by running sinfo. Output should look similar to this:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
main*        up   infinite      1   idle power9-compute-1

To run a quick example job on the compute node, run srun hostname, which should just output the hostname of the compute node.

That's it! Now you have a phenomenal and incredibly useful job scheduler installed on your cluster!

Adding InfiniBand Support

Clone this wiki locally