These instructions are to help setting up Miniconda and the correct Python environment for training Deep Learning CNN models on the Noether cluster.
To transfer files between uboonegpvm
and Noether, Kerberos is needed. Follow these steps:
-
Download the krb5.conf for SL7 from here.
-
Copy the file into your home directory on Noether.
-
Once downloaded, get a Kerberos ticket on Noether by running the following command:
kinit -fA <username>@FNAL.GOV
After logging into the Noether cluster, you will need to create a grid session with GPU access and install Miniforge:
-
Request a GPU session by running:
qrsh request_gpus=1
-
Navigate to your home directory and install Miniforge:
cd curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh
-
Once Miniforge is installed, initialise conda:
source ~/.bashrc conda init
Next, you will create the required Python environment and install the necessary libraries:
-
Create a new conda environment:
conda create -n python3LEE python=3.8
-
Activate the environment:
conda activate python3LEE
-
Install the required libraries:
conda install scipy pandas==1.0.5 matplotlib pyyaml tqdm scikit-learn jupyter pip install torch opencv-python
-
Verify that PyTorch is detecting the GPU:
python -c "import torch; print(torch.cuda.is_available())"
To run Jupyter notebooks on Noether with port forwarding, you will need to create three scripts.
-
Jupyter Script: Create the script
/gluster/home/<username>/scripts/bin/jupyter.sh
to launch Jupyter:#!/bin/bash # Script to run a Jupyter notebook on a specific port source /gluster/home/<username>/bin/conda_setup.sh python3LEE alias converttopy='jupyter nbconvert --to script' alias openjpnotebook='jupyter notebook --no-browser --port 1234' cd /gluster/home/<username>/DeepLearning/ jupyter notebook --no-browser --port=1234
-
Condor Submission File: Create a Condor submission file to run the Jupyter notebook with resource allocation. Save this as
/gluster/home/<username>/etc/jupyter.sub
:executable = bin/jupyter.sh request_memory = 8G request_cpus = 4 request_gpus = 1 request_disk = 5G initialdir = $ENV(HOME)/scripts output = out/jupyter/jupyter-$(Process).out error = out/jupyter/jupyter-$(Process).err log = out/jupyter/jupyter-$(Process).log arguments = $(Process) should_transfer_files = yes when_to_transfer_output = ON_EXIT queue 1
-
Conda Environment Setup Script: Create a script to manage your Conda environments. Save this as
/gluster/home/<username>/bin/conda_setup.sh
:#!/bin/bash /gluster/home/<username>/miniforge3/etc/profile.d/conda.sh conda init source ~/.bashrc conda activate "$1" alias converttopy='jupyter nbconvert --to script' alias openjpnotebook='jupyter notebook --no-browser --port 1234'
To access Jupyter notebooks remotely, follow these steps:
-
Disconnect from Noether and reconnect with port forwarding:
ssh -gL 1234:localhost:1234 <username>@noether.hep.manchester.ac.uk
-
Submit the Condor job to start the Jupyter notebook:
cd scripts condor_submit etc/jupyter.sub
-
Connect to the running Condor job using:
condor_ssh_to_job -ssh "ssh -gL 1234:localhost:1234" JOB_ID
-
Open the Jupyter notebook by visiting
http://localhost:1234
in your browser. Your working directory will be/gluster/home/<username>/DeepLearning/
.