Skip to content

Commit c4eab44

Browse files
author
Sean Smith
authored
Release 2.5.0
Merge Release 2.5.0
2 parents 8f5359f + da173a8 commit c4eab44

File tree

262 files changed

+17530
-11645
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

262 files changed

+17530
-11645
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
name: Bug report
3+
about: Please create a detailed report by completing the following information
4+
title: ''
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
**Environment:**
11+
- AWS ParallelCluster / CfnCluster version [e.g. aws-parallelcluster-2.4.1]
12+
- OS: [e.g. alinux]
13+
- Scheduler: [e.g. SGE]
14+
- Master instance type: [e.g. m5.xlarge]
15+
- Compute instance type: [e.g. c5.8xlarge]
16+
17+
**Bug description and how to reproduce:**
18+
A clear and concise description of what the bug is and the steps to reproduce the behavior.
19+
20+
**Additional context:**
21+
Any other context about the problem. E.g.:
22+
- configuration file without any credentials or personal data.
23+
- pre/post-install scripts, if any
24+
- screenshots, if useful
25+
- if the cluster fails creation, please re-execute `create` action using `--norollback` option and attach `/var/log/cfn-init.log`, `/var/log/cloud-init.log` and `/var/log/cloud-init-output.log` files from the Master node
26+
- if a compute node was terminated due to failure, there will be a directory `/home/logs/compute`. Attach one of the `instance-id.tar.gz` from that directory
27+
- if you encounter scaling problems please attach `/var/log/nodewatcher` from the Compute node and `/var/log/jobwatcher` and `/var/log/sqswatcher` from the Master node

.isort.cfg

+1
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ known_third_party=boto3,botocore,awscli,tabulate,argparse,configparser,pytest,py
1111
# )
1212
multi_line_output=3
1313
include_trailing_comma=true
14+
skip=pcluster/resources/batch/custom_resources_code/crhelper

.travis.yml

+1-7
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ python:
88
- "3.5"
99
- "3.6"
1010
- "3.7"
11+
- "3.8"
1112

1213
matrix:
1314
include:
@@ -19,13 +20,6 @@ matrix:
1920
python: 3.6
2021
stage: linters
2122
env: TOXENV=cfn-format-check,cfn-lint
22-
- name: Docs Checks
23-
python: 3.6
24-
stage: linters
25-
env: TOXENV=docs-linters
26-
before_install:
27-
# Needed to run docs-linters target in tox.
28-
- sudo apt-get update && sudo apt-get install -y enchant
2923

3024
install:
3125
- pip install tox-travis

CHANGELOG.rst

+66-3
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,78 @@
22
CHANGELOG
33
=========
44

5+
2.5.0
6+
=====
7+
8+
**ENHANCEMENTS**
9+
10+
* Add support for new OS: Ubuntu 18.04
11+
* Add support for AWS Batch scheduler in China partition and in ``eu-north-1``.
12+
* Revamped ``pcluster configure`` command which now supports automated networking configuration.
13+
* Add support for NICE DCV on Centos 7 to setup a graphical remote desktop session on the Master node.
14+
* Add support for new EFA supported instances: ``c5n.metal``, ``m5dn.24xlarge``, ``m5n.24xlarge``, ``r5dn.24xlarge``,
15+
``r5n.24xlarge``
16+
* Add support for scheduling with GPU options in Slurm. Currently supports the following GPU-related options: ``—G/——gpus,
17+
——gpus-per-task, ——gpus-per-node, ——gres=gpu, ——cpus-per-gpu``.
18+
Integrated GPU requirements into scaling logic, cluster will scale automatically to satisfy GPU/CPU requirements
19+
for pending jobs. When submitting GPU jobs, CPU/node/task information is not required but preferred in order to
20+
avoid ambiguity. If only GPU requirements are specified, cluster will scale up to the minimum number of nodes
21+
required to satisfy all GPU requirements.
22+
* Add new cluster configuration option to automatically disable Hyperthreading (``disable_hyperthreading = true``)
23+
* Install Intel Parallel Studio 2019.5 Runtime in Centos 7 when ``enable_intel_hpc_platform = true`` and share /opt/intel over NFS
24+
* Additional EC2 IAM Policies can now be added to the role ParallelCluster automatically creates for cluster nodes by
25+
simply specifying ``additional_iam_policies`` in the cluster config.
26+
27+
**CHANGES**
28+
29+
* Ubuntu 14.04 is no longer supported
30+
* Upgrade Intel MPI to version U5.
31+
* Upgrade EFA Installer to version 1.7.0, this also upgrades Open MPI to 4.0.2.
32+
* Upgrade NVIDIA driver to Tesla version 418.87.
33+
* Upgrade CUDA library to version 10.1.
34+
* Upgrade Slurm to version 19.05.3-2.
35+
* Install EFA in China AMIs.
36+
* Increase default EBS volume size from 17GB to 25GB
37+
* FSx Lustre now supports new storage_capacity options 1,200 and 2,400 GiB
38+
* Enable ``flock user_xattr noatime`` Lustre mount options by default everywhere and
39+
``x-systemd.automount x-systemd.requires=lnet.service`` for systemd based systems.
40+
* Increase the number of hosts that can be processed by scaling daemons in a single batch from 50 to 200. This
41+
improves the scaling time especially with increased ASG launch rates.
42+
* Change default sshd config in order to disable X11 forwarding and update the list of supported ciphers
43+
significantly increases scaling speed when ASG launch rate is raised.
44+
* Increase faulty node termination timeout from 1 minute to 5 in order to give some additional time to the scheduler
45+
to recover when under heavy load.
46+
* Extended ``pcluster createami`` command to specify the VPC and network settings when building the AMI.
47+
* Support inline comments in config file
48+
* Support Python 3.8 in pcluster CLI.
49+
* Deprecate Python 2.6 support
50+
* Add ``ClusterName`` tag to EC2 instances.
51+
* Search for new available version only at ``pcluster create`` action.
52+
* Enable ``sanity_check`` by default.
53+
54+
**BUG FIXES**
55+
56+
* Fix sanity check for custom ec2 role. Fixes `#1241 <https://github.com/aws/aws-parallelcluster/issues/1241>`_ .
57+
* Fix bug when using same subnet for both master and compute.
58+
* Fix bug when ganglia is enabled ganglia urls are shown. Fixes `#1322 <https://github.com/aws/aws-parallelcluster/issues/1322>`_ .
59+
* Fix bug with ``awsbatch`` scheduler that prevented Multi-node jobs from running.
60+
* Fix jobwatcher behaviour that was marking nodes locked by the nodewatcher as busy even if they had been removed
61+
already from the ASG Desired count. This was causing, in rare circumstances, a cluster overscaling.
62+
* Fix bug that was causing failures in sqswatcher when ADD and REMOVE event for the same host are fetched together.
63+
* Fix bug that was preventing nodes to mount partitioned EBS volumes.
64+
* Implement paginated calls in ``pcluster list``.
65+
* Fix bug when creating ``awsbatch`` cluster with name longer than 31 chars
66+
* Fix a bug that lead to ssh not working after ssh'ing into a compute node by ip address.
67+
568
2.4.1
669
=====
770

871
**ENHANCEMENTS**
972

1073
* Add support for ap-east-1 region (Hong Kong)
1174
* Add possibility to specify instance type to use when building custom AMIs with ``pcluster createami``
12-
* Speed up cluster creation by having compute nodes starting together with master node
13-
* Enable ASG CloudWatch metrics for the ASG managing compute nodes
75+
* Speed up cluster creation by having compute nodes starting together with master node. **Note** this requires one new IAM permissions in the `ParallelClusterInstancePolicy <https://docs.aws.amazon.com/en_us/parallelcluster/latest/ug/iam.html#parallelclusterinstancepolicy>`_, ``cloudformation:DescribeStackResource``
76+
* Enable ASG CloudWatch metrics for the ASG managing compute nodes. **Note** this requires two new IAM permissions in the `ParallelClusterUserPolicy <https://docs.aws.amazon.com/parallelcluster/latest/ug/iam.html#parallelclusteruserpolicy>`_, ``autoscaling:DisableMetricsCollection`` and ``autoscaling:EnableMetricsCollection``
1477
* Install Intel MPI 2019u4 on Amazon Linux, Centos 7 and Ubuntu 1604
1578
* Upgrade Elastic Fabric Adapter (EFA) to version 1.4.1 that supports Intel MPI
1679
* Run all node daemons and cookbook recipes in isolated Python virtualenvs. This allows our code to always run with the
@@ -44,7 +107,7 @@ CHANGELOG
44107
* Make FSx Substack depend on ComputeSecurityGroupIngress to keep FSx from trying to create prior to the SG
45108
allowing traffic within itself
46109
* Restore correct value for ``filehandle_limit`` that was getting reset when setting ``memory_limit`` for EFA
47-
* Torque: fix compute nodes locking mechanism to prevent job scheduling on nodes being terminated
110+
* Torque: fix compute nodes locking mechanism to prevent job scheduling on nodes being terminated
48111
* Restore logic that was automatically adding compute nodes identity to SSH ``known_hosts`` file
49112
* Slurm: fix issue that was causing the ParallelCluster daemons to fail when the cluster is stopped and an empty compute nodes file
50113
is imported in Slurm config

README.rst

+58-24
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,16 @@ You can build higher level workflows, such as a Genomics portal that automates t
1919

2020
Quick Start
2121
-----------
22-
First, install the library:
22+
**IMPORTANT**: you will need an **Amazon EC2 Key Pair** to be able to complete the following steps.
23+
Please see the `Official AWS Guide <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html>`_.
24+
25+
First, make sure you have installed the `AWS Command Line Interface <http://>`_:
26+
27+
.. code-block:: sh
28+
29+
$ pip install awscli
30+
31+
Then you can install AWS ParallelCluster:
2332

2433
.. code-block:: sh
2534
@@ -35,34 +44,59 @@ Next, configure your aws credentials and default region:
3544
Default region name [us-east-1]:
3645
Default output format [None]:
3746
38-
Then, run pcluster configure:
47+
Then, run ``pcluster configure``. A list of valid options will be displayed for each
48+
configuration parameter. Type an option number and press ``Enter`` to select a specific option,
49+
or just press ``Enter`` to accept the default option.
3950

4051
.. code-block:: ini
4152
4253
$ pcluster configure
43-
Cluster Template [default]:
44-
Acceptable Values for AWS Region ID:
45-
ap-south-1
46-
...
47-
us-west-2
54+
INFO: Configuration file /dir/conf_file will be written.
55+
Press CTRL-C to interrupt the procedure.
56+
57+
58+
Allowed values for AWS Region ID:
59+
1. eu-north-1
60+
...
61+
15. us-west-1
62+
16. us-west-2
4863
AWS Region ID [us-east-1]:
49-
VPC Name [myvpc]:
50-
Acceptable Values for Key Name:
51-
keypair1
52-
keypair-test
53-
production-key
54-
Key Name []:
55-
Acceptable Values for VPC ID:
56-
vpc-1kd24879
57-
vpc-blk4982d
58-
VPC ID []:
59-
Acceptable Values for Master Subnet ID:
60-
subnet-9k284a6f
61-
subnet-1k01g357
62-
subnet-b921nv04
63-
Master Subnet ID []:
64-
65-
Now you can create your first cluster;
64+
...
65+
66+
Be sure to select a region containing the EC2 key pair you wish to use. You can also import a public key using
67+
`these instructions <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#how-to-generate-your-own-key-and-import-it-to-aws>`_.
68+
69+
During the process you will be asked to set up your networking environment. The wizard will offer you the choice of
70+
using an existing VPC or creating a new one on the fly.
71+
72+
.. code-block:: ini
73+
74+
Automate VPC creation? (y/n) [n]:
75+
76+
Enter '``n``' if you already have a VPC suitable for the cluster. Otherwise you can let ``pcluster configure``
77+
create a VPC for you. The same choice is given for the subnet: you can select a valid subnet ID for
78+
both the master and compute nodes, or you can let ``pcluster configure`` set up everything for you.
79+
The same choice is given for the subnet configuration: you can select a valid subnet ID for both
80+
the master and compute nodes, or you can let pcluster configure set up everything for you.
81+
In the latter case, just select the configuration you prefer.
82+
83+
.. code-block:: ini
84+
85+
Automate Subnet creation? (y/n) [y]: y
86+
Allowed values for Network Configuration:
87+
1. Master in a public subnet and compute fleet in a private subnet
88+
2. Master and compute fleet in the same public subnet
89+
90+
91+
At the end of the process a message like this one will be shown:
92+
93+
.. code-block:: ini
94+
95+
Configuration file written to /dir/conf_file
96+
You can edit your configuration file or simply run 'pcluster create -c /dir/conf_file cluster-name' to create your cluster
97+
98+
99+
Now you can create your first cluster:
66100

67101
.. code-block:: sh
68102

0 commit comments

Comments
 (0)