Skip to content
This repository has been archived by the owner on Jun 4, 2024. It is now read-only.

[autoscaler] Documentation instructions for mounting EFS does not work when docker is specified #3

Open
jennakwon06 opened this issue Jan 28, 2021 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@jennakwon06
Copy link
Contributor

In this documentation page: https://docs.ray.io/en/latest/cluster/aws-tips.html#

The instructions only work when docker is not specified.
When docker is specified, the efs-related commands inside setup_commands array will try to run inside the Docker container and fail due to not having sudo installed.
I suggest improving the documentation page to be more accurate about only working when docker is not specified. In addition, it would be great to include a working example when docker container is getting used.

Sample yaml:

cluster_name: jkkwon_ray_test

min_workers: 5

max_workers: 10

upscaling_speed: 1.0

docker: 
    image: "048211272910.dkr.ecr.us-west-2.amazonaws.com/barsecrrepo-1cda8d0d3d9ee1867bae37291b6adc586a3f650c:308796b3-5c89-4a7c-83d0-5ce0abad3094_MiamiMLImage_main"
    container_name: "ray_container"
    pull_before_run: True
    run_options: []

idle_timeout_minutes: 5

provider:
    type: aws
    region: us-west-2
    availability_zone: us-west-2a,us-west-2b,us-west-2c,us-west-2d
    cache_stopped_nodes: False

auth:
    ssh_user: ubuntu
    ssh_private_key: miami_dev_dask_emr_key_pair.pem

head_node:
    InstanceType: r5.12xlarge
    ImageId: latest_dlami
    SecurityGroupIds:
        - "sg-08ed97f6d08d451f6"
    SubnetIds: [
        "subnet-02876545b671b57b0"
    ]
    # You can provision additional disk space with a conf as follows
    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
              VolumeSize: 100
    KeyName: "miami_dev_dask_emr_key_pair"

worker_nodes:
    InstanceType: r5.12xlarge
    ImageId: latest_dlami
    SecurityGroupIds:
        - "sg-08ed97f6d08d451f6"
    SubnetIds: [
        "subnet-0180e9267b994bf97",  # us-west-2a, 8187 IP addresses. 10.0.32.0/19
        "subnet-073e6e0338bf209cb",  # us-west-2b, 8187 IP addresses. 10.0.64.0/19
        "subnet-03caa10b59288efae",  # us-west-2c, 8187 IP addresses. 10.0.96.0/19
        "subnet-06dd6dbb8caf5c310",  # us-west-2d, 8187 IP addresses. 10.0.128.0/19
    ]
    # Run workers on spot by default. Comment this out to use on-demand.
    InstanceMarketOptions:
        MarketType: spot
    KeyName: "miami_dev_dask_emr_key_pair"
    
file_mounts_sync_continuously: False

rsync_exclude:
    - "**/.git"
    - "**/.git/**"

rsync_filter:
    - ".gitignore"

initialization_commands:
    - aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 048211272910.dkr.ecr.us-west-2.amazonaws.com;

# List of shell commands to run to set up nodes.
setup_commands:
      - pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
      - sudo kill -9 `sudo lsof /var/lib/dpkg/lock-frontend | awk '{print $2}' | tail -n 1`;
        sudo pkill -9 apt-get;
        sudo pkill -9 dpkg;
        sudo dpkg --configure -a;
        sudo apt-get -y install binutils;
        cd $HOME;
        git clone https://github.com/aws/efs-utils;
        cd $HOME/efs-utils;
        ./build-deb.sh;
        sudo apt-get -y install ./build/amazon-efs-utils*deb;
        cd $HOME;
        mkdir efs;
        sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-098309a3.efs.us-west-2.amazonaws.com:/ efs;
        sudo chmod 777 efs;    
        

head_setup_commands: []

worker_setup_commands: []

head_start_ray_commands:
    - ray stop
    - ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml

worker_start_ray_commands:
    - ray stop
    - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

Sample output:



WARNING: You are using pip version 20.3.3; however, version 21.0 is available.
You should consider upgrading via the '/usr/local/bin/python3.7 -m pip install --upgrade pip' command.
Shared connection to 34.220.27.124 closed.
    (1/2) sudo kill -9 `sudo lsof /var/l...
bash: sudo: command not found
bash: sudo: command not found
bash: sudo: command not found
bash: sudo: command not found
bash: sudo: command not found
bash: sudo: command not found
Cloning into 'efs-utils'...
remote: Enumerating objects: 142, done.
remote: Counting objects: 100% (142/142), done.
remote: Compressing objects: 100% (74/74), done.
remote: Total 792 (delta 79), reused 100 (delta 51), pack-reused 650
Receiving objects: 100% (792/792), 234.63 KiB | 6.02 MiB/s, done.
Resolving deltas: 100% (462/462), done.
+ pwd
+ BASE_DIR=/root/efs-utils
+ BUILD_ROOT=/root/efs-utils/build/debbuild
+ VERSION=1.28.2
+ RELEASE=1
+ DEB_SYSTEM_RELEASE_PATH=/etc/os-release
+ UBUNTU18_REGEX=Ubuntu 18
+ UBUNTU20_REGEX=Ubuntu 20
+ DEBIAN11_REGEX=Debian GNU/Linux bullseye
+ echo Cleaning deb build workspace
Cleaning deb build workspace
+ rm -rf /root/efs-utils/build/debbuild
+ mkdir -p /root/efs-utils/build/debbuild
+ echo Creating application directories
Creating application directories
+ mkdir -p /root/efs-utils/build/debbuild/etc/amazon/efs
+ mkdir -p /root/efs-utils/build/debbuild/etc/init/
+ mkdir -p /root/efs-utils/build/debbuild/etc/systemd/system
+ mkdir -p /root/efs-utils/build/debbuild/sbin
+ mkdir -p /root/efs-utils/build/debbuild/usr/bin
+ mkdir -p /root/efs-utils/build/debbuild/var/log/amazon/efs
+ mkdir -p /root/efs-utils/build/debbuild/usr/share/man/man8
+ [ -f /etc/os-release ]
+ grep -e Ubuntu 18 -e Debian GNU/Linux bullseye+  -e Ubuntu 20
grep PRETTY_NAME /etc/os-release
+ echo PRETTY_NAME="Ubuntu 18.04.5 LTS"
PRETTY_NAME="Ubuntu 18.04.5 LTS"
+ echo Correcting python executable
Correcting python executable
+ sed -i -e s/python|python2/python3/ dist/amazon-efs-utils.control
+ sed -i -e 1 s/^.*$/\#!\/usr\/bin\/env python3/ src/watchdog/__init__.py
+ sed -i -e 1 s/^.*$/\#!\/usr\/bin\/env python3/ src/mount_efs/__init__.py
+ echo Copying application files
Copying application files
+ install -p -m 644 dist/amazon-efs-mount-watchdog.conf /root/efs-utils/build/debbuild/etc/init
+ install -p -m 644 dist/amazon-efs-mount-watchdog.service /root/efs-utils/build/debbuild/etc/systemd/system
+ install -p -m 444 dist/efs-utils.crt /root/efs-utils/build/debbuild/etc/amazon/efs
+ install -p -m 644 dist/efs-utils.conf /root/efs-utils/build/debbuild/etc/amazon/efs
+ install -p -m 755 src/mount_efs/__init__.py /root/efs-utils/build/debbuild/sbin/mount.efs
+ install -p -m 755 src/watchdog/__init__.py /root/efs-utils/build/debbuild/usr/bin/amazon-efs-mount-watchdog
+ echo Copying install scripts
Copying install scripts
+ install -p -m 755 dist/scriptlets/after-install-upgrade /root/efs-utils/build/debbuild/postinst
+ install -p -m 755 dist/scriptlets/before-remove /root/efs-utils/build/debbuild/prerm
+ install -p -m 755 dist/scriptlets/after-remove /root/efs-utils/build/debbuild/postrm
+ echo Copying control file
Copying control file
+ install -p -m 644 dist/amazon-efs-utils.control /root/efs-utils/build/debbuild/control
+ echo Copying conffiles
Copying conffiles
+ install -p -m 644 dist/amazon-efs-utils.conffiles /root/efs-utils/build/debbuild/conffiles
+ echo Copying manpages
Copying manpages
+ install -p -m 644 man/mount.efs.8 /root/efs-utils/build/debbuild/usr/share/man/man8/mount.efs.8
+ echo Creating deb binary file
Creating deb binary file
+ echo 2.0
+ echo Setting permissions
Setting permissions
+ find /root/efs-utils/build/debbuild -type d
+ xargs chmod 755
+ echo Creating tar
Creating tar
+ cd /root/efs-utils/build/debbuild
+ tar czf control.tar.gz control conffiles postinst prerm postrm --owner=0 --group=0
+ tar czf data.tar.gz etc sbin usr var --owner=0 --group=0
+ cd /root/efs-utils
+ echo Building deb
Building deb
+ DEB=/root/efs-utils/build/debbuild/amazon-efs-utils-1.28.2-1_all.deb
+ ar r /root/efs-utils/build/debbuild/amazon-efs-utils-1.28.2-1_all.deb /root/efs-utils/build/debbuild/debian-binary
ar: creating /root/efs-utils/build/debbuild/amazon-efs-utils-1.28.2-1_all.deb
+ ar r /root/efs-utils/build/debbuild/amazon-efs-utils-1.28.2-1_all.deb /root/efs-utils/build/debbuild/control.tar.gz
+ ar r /root/efs-utils/build/debbuild/amazon-efs-utils-1.28.2-1_all.deb /root/efs-utils/build/debbuild/data.tar.gz
+ echo Copying deb to output directory
Copying deb to output directory
+ cp /root/efs-utils/build/debbuild/amazon-efs-utils-1.28.2-1_all.deb build/
bash: sudo: command not found
bash: sudo: command not found
bash: sudo: command not found
Shared connection to 34.220.27.124 closed.

@jennakwon06 jennakwon06 added the enhancement New feature or request label Jan 28, 2021
@jennakwon06 jennakwon06 changed the title [autoscaler] Mounting EFS does not work when docker is specified [autoscaler] Documentation instructions for mounting EFS does not work when docker is specified Jan 28, 2021
@pdames
Copy link
Member

pdames commented Jan 28, 2021

@jennakwon06 - I think you may be one of the first people to really experiment with custom EFS mounts together with docker images. If you're able to get it working, would you mind contributing this improvement to https://docs.ray.io/en/latest/cluster/aws-tips.html?

@jennakwon06
Copy link
Contributor Author

Yeah I actually do have it working in terms of spinning up a Ray cluster! I will take up that contribution and will leave this issue open until I do that.

@pdames pdames added the documentation Improvements or additions to documentation label Feb 23, 2021
rkenmi pushed a commit that referenced this issue Mar 8, 2022
@yang0110
Copy link

Hi I encounter the same problem. Could I know how you resolve this issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants