From 91c6b41ac0321df82451c5717d82d86b23ea5c80 Mon Sep 17 00:00:00 2001 From: "matthew.lanting" Date: Fri, 13 Sep 2019 13:30:51 -0400 Subject: [PATCH 01/12] Added initial draft of multi-machine launch design. Signed-off-by: matthew.lanting --- .../152_roslaunch_multi_machine_launch.md | 281 ++++++++++++++++++ 1 file changed, 281 insertions(+) create mode 100644 articles/152_roslaunch_multi_machine_launch.md diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md new file mode 100644 index 000000000..38e49fa36 --- /dev/null +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -0,0 +1,281 @@ +--- +layout: default +title: ROS 2 Multi-Machine Launching +permalink: articles/roslaunch_mml.html +abstract: + Robotic systems are often distributed across multiple networked machines. + This document describes proposed modifications and enhancements to ROS2's + launch system to facilitate launching, monitoring, and shutting + down systems spread across multiple machines. +author: '[Matt Lanting](https://github.com/mlanting)' +published: false +--- + +- This will become a table of contents (this text will be scraped). +{:toc} + +# {{ page.title }} + +
+{{ page.abstract }} +
+ +Authors: {{ page.author }} + +## Purpose + +Allow a system of ROS nodes to be launched on a hardware architecture that is +spread across multiple networked computers and facilitate introspection and +management of the system from a single machine. + +### Features and Considerations + +Many of these are just extensions of the design goals for the single-machine +version of roslaunch for ROS2, but instead of only dealing with how to group +nodes in terms of processes, we can additionally group processes in terms of +machines. In addition to extending the current ROS2 launch goals for remote +machines, We would also like to consider more advanced features that could be +added such as advanced command line tools, forms of load balancing, +sending/retrieving files (such as configuration data or maps) to/from remote +machines. + + 1. Some nodes may need to be run on specific machines due to hardware architecture. + - Cameras or other sensors being directly connected + - Specialized processing hardware + 2. Other nodes may not care which machine they run on and can be executed on machines with less workload as determined at time of launch. + 3. Need to manage lifecycles of nodes that have them. + 4. Need to monitor node status, and attempt recovery from failures. + - Attempt recovery from crashed nodes by restarting them, possibly on a different machine. + 5. Connect to remote machines securely over SSH. + - How to manage credentials? + - Have users manually set up the accounts and put passwords in launch files? + - Have users set up ssh keys for the computers in the system? + 6. Provide command line tools for managing and monitoring launched systems on remote machines. + 7. Provide mechanisms for locating files and executables across machines, and sending files to machines that need them for certain nodes. + - If we intend to do any kind of load balancing or launching of nodes on non-specified machines (i.e. determined by the launch system during the launch process rather than specified by the user in the launch file), certain types of resources may need to be transferred to other machines. Calibration data, map files, training data, etc. This potentially creates a need to keep track of which machine has the most recent version of such resources (e.g. in the case of training data, or any other resource where a node might save data to be loaded next time it is launched). + 8. Should be able to work with machines running different operating systems on the same network. + + +## Proposed Multi-Machine Launch Command Line Interface + +The multi-machine launching interface is controlled through the `launcher` +command for the `ros2` command-line tool. The existing `launch` command +provides a subset of this functionality that is sufficient for single-machine +launching. + +### Commands + +```bash +$ ros2 launcher +usage: ros2 launcher [-h] Call `ros2 launcher -h` for more detailed usage. ... + +Various launching related sub-commands + +optional arguments: + -h, --help show this help message and exit + +Commands: + launch Run a launch file + list Search for and list running launch systems + attach Attach to a running launch system and wait for it to finish + term Terminate a running launch system + + Call `ros2 launcher -h` for more detailed usage. +``` + +#### `launch` + +The `ros2 launcher launch` is equivalent to `ros2 launch`, which is preserved +for backwards compatibility and ease of use. It is used to run a launch file. + +```bash +$ ros2 launcher launch -h +usage: ros2 launcher launch [-h] [-d] [-D] [-p | -s] [-a] + package_name [launch_file_name] + [launch_arguments [launch_arguments ...]] ... + +Run a launch file + +positional arguments: + package_name Name of the ROS package which contains the launch file + launch_file_name Name of the launch file + launch_arguments Arguments to the launch file; ':=' (for + duplicates, last one wins) + argv Pass arbitrary arguments to the launch file + +optional arguments: + -h, --help Show this help message and exit. + -d, --debug Put the launch system in debug mode, provides more + verbose output. + -D, --detach Detach from the launch process after it has started. + -p, --print, --print-description + Print the launch description to the console without + launching it. + -s, --show-args, --show-arguments + Show arguments that may be given to the launch file. + -a, --show-all-subprocesses-output + Show all launched subprocesses' output by overriding + their output configuration using the + OVERRIDE_LAUNCH_PROCESS_OUTPUT envvar. +``` + +Example output: + +```bash +$ ros2 launcher launch demo_nodes_cpp talker_listener.launch.py +[INFO] [launch]: All log files can be found below /home/preed/.ros/log/2019-09-11-20-54-30-715383-regulus-2799 +[INFO] [launch]: Default logging verbosity is set to INFO +[INFO] [launch]: Launch System ID is 50bda6fb-d451-4d53-8a2b-e8fcdce8170b +[INFO] [talker-1]: process started with pid [2809] +[INFO] [listener-2]: process started with pid [2810] +[talker-1] [INFO] [talker]: Publishing: 'Hello World: 1' +[listener-2] [INFO] [listener]: I heard: [Hello World: 1] +[talker-1] [INFO] [talker]: Publishing: 'Hello World: 2' +[listener-2] [INFO] [listener]: I heard: [Hello World: 2] +[talker-1] [INFO] [talker]: Publishing: 'Hello World: 3' +[listener-2] [INFO] [listener]: I heard: [Hello World: 3] +[talker-1] [INFO] [talker]: Publishing: 'Hello World: 4' +[listener-2] [INFO] [listener]: I heard: [Hello World: 4] +^C[WARNING] [launch]: user interrupted with ctrl-c (SIGINT) +[listener-2] [INFO] [rclcpp]: signal_handler(signal_value=2) +[INFO] [talker-1]: process has finished cleanly [pid 2809] +[INFO] [listener-2]: process has finished cleanly [pid 2810] +[talker-1] [INFO] [rclcpp]: signal_handler(signal_value=2) +``` + +Note how there is one difference from the old behavior of `ros2 launch`; the +group of nodes is assigned a Launch System ID. This is a unique identifier +that can be used to track all of the nodes launched by a particular command +across a network. + +Additionally, it is possible to detach from a system and let it run in the +background: + +```bash +$ ros2 launcher launch -D demo_nodes_cpp talker_listener.launch.py +[INFO] [launch]: All log files can be found below /home/preed/.ros/log/2019-09-11-20-54-30-715383-regulus-2799 +[INFO] [launch]: Default logging verbosity is set to INFO +[INFO] [launch]: Launch System ID is 50bda6fb-d451-4d53-8a2b-e8fcdce8170b +$ +``` + +#### `list` + +Since it is possible to launch a system of nodes that spans a network and detach +from it, it is necessary to be able to query the network to find which systems +are active. + +```bash +$ ros2 launcher list -h +usage: ros2 launcher list [-h] [-v] [--spin-time SPIN_TIME] + +List running launch systems + + +optional arguments: + -h, --help Show this help message and exit. + -v, --verbose Provides more verbose output. + --spin-time SPIN_TIME + Spin time in seconds to wait for discovery (only + applies when not using an already running daemon) +``` + +Example output: + +```bash +$ ros2 launcher list +ab1e0138-bb22-4ec9-a590-cf377de42d0f +50bda6fb-d451-4d53-8a2b-e8fcdce8170b +5d186778-1f50-4828-9425-64cc2ed1342c +$ +``` + +```bash +$ ros2 launcher list +ab1e0138-bb22-4ec9-a590-cf377de42d0f: 5 nodes, 2 hosts +50bda6fb-d451-4d53-8a2b-e8fcdce8170b: 2 nodes, 1 host +5d186778-1f50-4828-9425-64cc2ed1342c: 16 nodes, 3 hosts +$ +``` + +#### `attach` + +Since it is possible to detach from a launched system, it is useful for +scripting or diagnostic purposes to be able to re-attach to it. + +```bash +$ ros2 launcher attach -h +usage: ros2 launcher attach [-h] [-v] [--spin-time SPIN_TIME] [system_id] + +Blocks until all nodes running under the specified Launch System ID have exited + +positional arguments: + system_id Launch System ID of the nodes to attach to; if less than + a full UUID is specified, it will attach to the first + Launch System it finds whose ID begins with that sub-string + +optional arguments: + -h, --help Show this help message and exit. + -v, --verbose Provides more verbose output. + --spin-time SPIN_TIME + Spin time in seconds to wait for discovery (only + applies when not using an already running daemon) +``` + +Example output: + +```bash +$ ros2 launcher attach 50bda6fb-d451-4d53-8a2b-e8fcdce8170b +Attached to Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b. +(... in another terminal, run `ros2 launcher term 50bda6fb`...) +All nodes in Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b have exited. +$ +``` + +Verbose mode: + +```bash +$ ros2 launcher attach -v 50bda6fb +Attached to Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b. +Waiting for node /launch_ros +Waiting for node /talker +Waiting for node /listener +(... in another terminal, run `ros2 launcher term 50bda6fb`...) +Node /launch_ros has exited +Node /talker has exited +Node /listener has exited +All nodes in Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b have exited. +$ +``` + +#### `term` + +Terminates all nodes that were launched under a specific Launch System ID. + +```bash +$ ros2 launcher term -h +usage: ros2 launcher term [-h] [-v] [--spin-time SPIN_TIME] [system_id] + +Terminates all nodes that were launched under a specific Launch System ID + +positional arguments: + system_id Launch System ID of the nodes to terminate; if less than + a full UUID is specified, it will terminate nodes + belonging to the first Launch System it finds whose ID + begins with that sub-string + +optional arguments: + -h, --help Show this help message and exit. + -v, --verbose Provides more verbose output. + --spin-time SPIN_TIME + Spin time in seconds to wait for discovery (only + applies when not using an already running daemon) +``` + +Example output: + +```bash +$ ros2 launcher term 50bda6fb-d451-4d53-8a2b-e8fcdce8170b +Terminating Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b. +$ +``` From 9f3d7de56c3a8668cafc5cfe500b4d1e0344b65a Mon Sep 17 00:00:00 2001 From: "matthew.lanting" Date: Fri, 13 Sep 2019 14:04:25 -0400 Subject: [PATCH 02/12] Separated features and considerations in multi-machine launch design. Signed-off-by: matthew.lanting --- .../152_roslaunch_multi_machine_launch.md | 81 +++++++++++++------ 1 file changed, 55 insertions(+), 26 deletions(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 38e49fa36..16971eac6 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -28,32 +28,61 @@ Allow a system of ROS nodes to be launched on a hardware architecture that is spread across multiple networked computers and facilitate introspection and management of the system from a single machine. -### Features and Considerations - -Many of these are just extensions of the design goals for the single-machine -version of roslaunch for ROS2, but instead of only dealing with how to group -nodes in terms of processes, we can additionally group processes in terms of -machines. In addition to extending the current ROS2 launch goals for remote -machines, We would also like to consider more advanced features that could be -added such as advanced command line tools, forms of load balancing, -sending/retrieving files (such as configuration data or maps) to/from remote -machines. - - 1. Some nodes may need to be run on specific machines due to hardware architecture. - - Cameras or other sensors being directly connected - - Specialized processing hardware - 2. Other nodes may not care which machine they run on and can be executed on machines with less workload as determined at time of launch. - 3. Need to manage lifecycles of nodes that have them. - 4. Need to monitor node status, and attempt recovery from failures. - - Attempt recovery from crashed nodes by restarting them, possibly on a different machine. - 5. Connect to remote machines securely over SSH. - - How to manage credentials? - - Have users manually set up the accounts and put passwords in launch files? - - Have users set up ssh keys for the computers in the system? - 6. Provide command line tools for managing and monitoring launched systems on remote machines. - 7. Provide mechanisms for locating files and executables across machines, and sending files to machines that need them for certain nodes. - - If we intend to do any kind of load balancing or launching of nodes on non-specified machines (i.e. determined by the launch system during the launch process rather than specified by the user in the launch file), certain types of resources may need to be transferred to other machines. Calibration data, map files, training data, etc. This potentially creates a need to keep track of which machine has the most recent version of such resources (e.g. in the case of training data, or any other resource where a node might save data to be loaded next time it is launched). - 8. Should be able to work with machines running different operating systems on the same network. +## Justification + +Nodes can need to run on different hosts for a variety of reasons. Some possible +use cases: + +- A large robot could with hosts located physically near the hardware they are +controlling such as cameras or other sensors +- A robot with hosts with different architectures in order to use specialized +processing hardware +- A robot with a cluster of machines that do distributed processing of data +- A network of multiple virtual hosts for testing purposes +- A swarm of independent drones that can cooperate but do not require +communication with each other +- A client computer is used to launch and monitor a system on a remote host but +is not required for the system to operate + +## Capabilities + +In order to meet the above use cases, a launch system needs to have a number of +capabilities, including: + +- Connecting to a remote host and running nodes on it +- Pushing configuration parameters for nodes to remote hosts +- Monitoring the status and tracking the lifecycle of nodes +- Recovering from failures by optionally restarting nodes +- Gracefully shutting down nodes on every host + +Most of these are just extensions of the design goals for the single-machine +version of roslaunch for ROS2, but in addition to extending those goals to +remote machines, we would also like to consider a few more advanced features, +including: + +- Load balancing nodes on distributed networks +- Command line tools for managing and monitoring systems across machines +- Mechanisms for locating files and executables across machines +- Sharing and synchronizing files across machines + +## Considerations + +There are some outstanding issues that may complicate things: + +- How to group nodes/participants/processes is somewhat of an open issue with potential + implications for this part of ROS2. + - https://github.com/ros2/design/pull/250/files/8ccaac3d60d7a0ded50934ba6416550f8d2af332?short_path=dd776c0#diff-dd776c070ecf252bc4dcc4b86a97c888 + - The number of domain participants is limited per vendor (Connext is 120 per domain). +- No `rosmaster` means there is no central mechanism for controlling modes or + distributing parameters +- Machines may be running different operating systems +- If we intend to do any kind of load balancing, certain types of resources may + need to be transferred to other machines. + - Calibration data, map files, training data, etc. + - Need to keep track of which machine has the most recent version of such + resources +- Security: we'll need to manage credentials across numerous machines both for SSH + and secure DDS. ## Proposed Multi-Machine Launch Command Line Interface From a4f956c8f2258d979b2d2187a02083899722e520 Mon Sep 17 00:00:00 2001 From: "matthew.lanting" Date: Wed, 18 Sep 2019 11:13:18 -0400 Subject: [PATCH 03/12] Add more details to ros2 launcher list Signed-off-by: matthew.lanting --- articles/152_roslaunch_multi_machine_launch.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 16971eac6..fcf31cb23 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -220,10 +220,19 @@ $ ``` ```bash -$ ros2 launcher list +$ ros2 launcher list -v ab1e0138-bb22-4ec9-a590-cf377de42d0f: 5 nodes, 2 hosts + Launch host: 192.168.10.5 + Launch time: Fri Sep 13 15:39:45 CDT 2019 + Launch command: ros2 launcher launch package_foo bar.launch.py argument:=value 50bda6fb-d451-4d53-8a2b-e8fcdce8170b: 2 nodes, 1 host + Launch host: 192.168.10.15 + Launch time: Fri Sep 13 12:39:45 CDT 2019 + Launch command: ros2 launcher launch demo_nodes_cpp talker_listener.launch.py 5d186778-1f50-4828-9425-64cc2ed1342c: 16 nodes, 3 hosts + Launch host: 192.168.10.13 + Launch time: Fri Sep 12 10:39:45 CDT 2019 + Launch command: ros2 launcher launch package_foo bar2.launch.py $ ``` From 2e7634b8327f0d70e583567c86c91b9bb52f0087 Mon Sep 17 00:00:00 2001 From: "matthew.lanting" Date: Thu, 19 Sep 2019 14:35:25 -0400 Subject: [PATCH 04/12] Formatting changes based on PR feedback. - Added a 'Context' section to describe the multi-machine launch capabilities of ROS1 and point to remote launch section of the main launch design document. - Changed linebreaks to one sentence per line. - Added a 'Proposed Approach' section where we can start to describe our design(s) from a technical perspective. - Added a 'Goals' section. Not much useful content here yet though. Signed-off-by: matthew.lanting --- .../152_roslaunch_multi_machine_launch.md | 110 ++++++++---------- 1 file changed, 50 insertions(+), 60 deletions(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index fcf31cb23..9acc455b4 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -4,9 +4,7 @@ title: ROS 2 Multi-Machine Launching permalink: articles/roslaunch_mml.html abstract: Robotic systems are often distributed across multiple networked machines. - This document describes proposed modifications and enhancements to ROS2's - launch system to facilitate launching, monitoring, and shutting - down systems spread across multiple machines. + This document describes proposed modifications and enhancements to ROS2's launch system to facilitate launching, monitoring, and shutting down systems spread across multiple machines. author: '[Matt Lanting](https://github.com/mlanting)' published: false --- @@ -22,32 +20,44 @@ published: false Authors: {{ page.author }} -## Purpose +## Context -Allow a system of ROS nodes to be launched on a hardware architecture that is -spread across multiple networked computers and facilitate introspection and -management of the system from a single machine. +Robotic platforms often consist of multiple computers communicating over a network, and users will want to be able to start and stop the software on such systems without needing to manage each machine individually. +The launch system in ROS 1 included a tag for launch files that allowed users to include information about networked machines and how to connect so that processes could be started remotely. +We would like to include this feature in the launch system for ROS 2 and even extend its capabilities based on things we've learned from working with multi-machine systems in ROS 1. +This document elaborates on the details of launching remote operating system processes alluded to [here](https://github.com/ros2/design/blob/gh-pages/articles/150_roslaunch.md#remote-operating-system-processes) in the main ROS 2 ros_launch design document. ## Justification -Nodes can need to run on different hosts for a variety of reasons. Some possible -use cases: +Nodes may need to run on different hosts for a variety of reasons. +Some possible use cases include: -- A large robot could with hosts located physically near the hardware they are -controlling such as cameras or other sensors -- A robot with hosts with different architectures in order to use specialized -processing hardware +- A large robot with hosts located physically near the hardware they are controlling such as cameras or other sensors +- A robot with hosts with different architectures in order to use specialized processing hardware - A robot with a cluster of machines that do distributed processing of data - A network of multiple virtual hosts for testing purposes -- A swarm of independent drones that can cooperate but do not require -communication with each other -- A client computer is used to launch and monitor a system on a remote host but -is not required for the system to operate +- A swarm of independent drones that can cooperate but do not require communication with each other +- A client computer is used to launch and monitor a system on a remote host but is not required for the system to operate + +## Purpose + +Allow a system of ROS nodes to be launched on a hardware architecture that is spread across multiple networked computers and facilitate introspection and management of the system from a single machine. + +## Goals + +- Allow ROS2 nodes to be launched remotely over a network. +- Allow users to specify a networked machine on which to run a particular node. +- Allow users to create a list of host machines to feed into the launch system. +- Enable the launch system to use the list of hosts to distribute nodes or systems of nodes among. +- Allow load-balancing of nodes or systems of nodes among machines when nodes do not need to be tied to a specific device. +- Allow users to shutdown a system of ROS nodes remotely +- Provide tools for introspection of ROS systems distributed across multiple machines. +- Provide API to third-party orchestration tools. +- Provide mechanisms for automated recovery of failed nodes in a distributed system. ## Capabilities -In order to meet the above use cases, a launch system needs to have a number of -capabilities, including: +In order to meet the above use cases, a launch system needs to have a number of capabilities, including: - Connecting to a remote host and running nodes on it - Pushing configuration parameters for nodes to remote hosts @@ -55,10 +65,7 @@ capabilities, including: - Recovering from failures by optionally restarting nodes - Gracefully shutting down nodes on every host -Most of these are just extensions of the design goals for the single-machine -version of roslaunch for ROS2, but in addition to extending those goals to -remote machines, we would also like to consider a few more advanced features, -including: +Most of these are just extensions of the design goals for the single-machine version of roslaunch for ROS2, but in addition to extending those goals to remote machines, we would also like to consider a few more advanced features, including: - Load balancing nodes on distributed networks - Command line tools for managing and monitoring systems across machines @@ -69,28 +76,22 @@ including: There are some outstanding issues that may complicate things: -- How to group nodes/participants/processes is somewhat of an open issue with potential - implications for this part of ROS2. +- How to group nodes/participants/processes is somewhat of an open issue with potential implications for this part of ROS2. - https://github.com/ros2/design/pull/250/files/8ccaac3d60d7a0ded50934ba6416550f8d2af332?short_path=dd776c0#diff-dd776c070ecf252bc4dcc4b86a97c888 - The number of domain participants is limited per vendor (Connext is 120 per domain). -- No `rosmaster` means there is no central mechanism for controlling modes or - distributing parameters +- No `rosmaster` means there is no central mechanism for controlling modes or distributing parameters - Machines may be running different operating systems -- If we intend to do any kind of load balancing, certain types of resources may - need to be transferred to other machines. +- If we intend to do any kind of load balancing, certain types of resources may need to be transferred to other machines. - Calibration data, map files, training data, etc. - - Need to keep track of which machine has the most recent version of such - resources -- Security: we'll need to manage credentials across numerous machines both for SSH - and secure DDS. + - Need to keep track of which machine has the most recent version of such resources +- Security: we'll need to manage credentials across numerous machines both for SSH and secure DDS. +## Proposed Approach ## Proposed Multi-Machine Launch Command Line Interface -The multi-machine launching interface is controlled through the `launcher` -command for the `ros2` command-line tool. The existing `launch` command -provides a subset of this functionality that is sufficient for single-machine -launching. +The multi-machine launching interface is controlled through the `launcher` command for the `ros2` command-line tool. +The existing `launch` command provides a subset of this functionality that is sufficient for single-machine launching. ### Commands @@ -114,8 +115,8 @@ Commands: #### `launch` -The `ros2 launcher launch` is equivalent to `ros2 launch`, which is preserved -for backwards compatibility and ease of use. It is used to run a launch file. +The `ros2 launcher launch` is equivalent to `ros2 launch`, which is preserved for backwards compatibility and ease of use. +It is used to run a launch file. ```bash $ ros2 launcher launch -h @@ -134,12 +135,10 @@ positional arguments: optional arguments: -h, --help Show this help message and exit. - -d, --debug Put the launch system in debug mode, provides more - verbose output. + -d, --debug Put the launch system in debug mode, provides more verbose output. -D, --detach Detach from the launch process after it has started. - -p, --print, --print-description - Print the launch description to the console without - launching it. + -p, --print, --print-description + Print the launch description to the console without launching it. -s, --show-args, --show-arguments Show arguments that may be given to the launch file. -a, --show-all-subprocesses-output @@ -172,13 +171,10 @@ $ ros2 launcher launch demo_nodes_cpp talker_listener.launch.py [talker-1] [INFO] [rclcpp]: signal_handler(signal_value=2) ``` -Note how there is one difference from the old behavior of `ros2 launch`; the -group of nodes is assigned a Launch System ID. This is a unique identifier -that can be used to track all of the nodes launched by a particular command -across a network. +Note how there is one difference from the old behavior of `ros2 launch`; the group of nodes is assigned a Launch System ID. +This is a unique identifier that can be used to track all of the nodes launched by a particular command across a network. -Additionally, it is possible to detach from a system and let it run in the -background: +Additionally, it is possible to detach from a system and let it run in the background: ```bash $ ros2 launcher launch -D demo_nodes_cpp talker_listener.launch.py @@ -190,9 +186,7 @@ $ #### `list` -Since it is possible to launch a system of nodes that spans a network and detach -from it, it is necessary to be able to query the network to find which systems -are active. +Since it is possible to launch a system of nodes that spans a network and detach from it, it is necessary to be able to query the network to find which systems are active. ```bash $ ros2 launcher list -h @@ -238,8 +232,7 @@ $ #### `attach` -Since it is possible to detach from a launched system, it is useful for -scripting or diagnostic purposes to be able to re-attach to it. +Since it is possible to detach from a launched system, it is useful for scripting or diagnostic purposes to be able to re-attach to it. ```bash $ ros2 launcher attach -h @@ -248,16 +241,13 @@ usage: ros2 launcher attach [-h] [-v] [--spin-time SPIN_TIME] [system_id] Blocks until all nodes running under the specified Launch System ID have exited positional arguments: - system_id Launch System ID of the nodes to attach to; if less than - a full UUID is specified, it will attach to the first - Launch System it finds whose ID begins with that sub-string + system_id Launch System ID of the nodes to attach to; if less than a full UUID is specified, it will attach to the first Launch System it finds whose ID begins with that sub-string optional arguments: -h, --help Show this help message and exit. -v, --verbose Provides more verbose output. --spin-time SPIN_TIME - Spin time in seconds to wait for discovery (only - applies when not using an already running daemon) + Spin time in seconds to wait for discovery (only applies when not using an already running daemon) ``` Example output: From 022e04d05890b9743e66ccb78c8874ce16474b8b Mon Sep 17 00:00:00 2001 From: "matthew.lanting" Date: Wed, 25 Sep 2019 12:07:54 -0400 Subject: [PATCH 05/12] Started stubbing out proposed design details - Also did a bit more reformatting/reworking of some of the main sections based on some of the feedback we've gotten so far Signed-off-by: matthew.lanting --- .../152_roslaunch_multi_machine_launch.md | 83 ++++++++++++------- 1 file changed, 53 insertions(+), 30 deletions(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 9acc455b4..54ed614b1 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -22,55 +22,50 @@ Authors: {{ page.author }} ## Context +This document elaborates on the details of launching remote operating system processes alluded to [here](https://github.com/ros2/design/blob/gh-pages/articles/150_roslaunch.md#remote-operating-system-processes) in the main ROS 2 ros_launch design document. + Robotic platforms often consist of multiple computers communicating over a network, and users will want to be able to start and stop the software on such systems without needing to manage each machine individually. The launch system in ROS 1 included a tag for launch files that allowed users to include information about networked machines and how to connect so that processes could be started remotely. -We would like to include this feature in the launch system for ROS 2 and even extend its capabilities based on things we've learned from working with multi-machine systems in ROS 1. -This document elaborates on the details of launching remote operating system processes alluded to [here](https://github.com/ros2/design/blob/gh-pages/articles/150_roslaunch.md#remote-operating-system-processes) in the main ROS 2 ros_launch design document. +We would like to replicate that capability in the launch system for ROS 2 and extend its capabilities based on lessons learned from working with multi-machine systems in ROS 1. +ROS 2 also has a few notable design differences from ROS 1 that will affect the way multi-machine launching is implemented. -## Justification -Nodes may need to run on different hosts for a variety of reasons. -Some possible use cases include: +### Differences from ROS 1 -- A large robot with hosts located physically near the hardware they are controlling such as cameras or other sensors -- A robot with hosts with different architectures in order to use specialized processing hardware -- A robot with a cluster of machines that do distributed processing of data -- A network of multiple virtual hosts for testing purposes -- A swarm of independent drones that can cooperate but do not require communication with each other -- A client computer is used to launch and monitor a system on a remote host but is not required for the system to operate +One of the most notable differences in ROS 2 is the lack of roscore. +In ROS 1, roscore adds a rosmaster, a parameter server, and a logging node when it is run. +Roslaunch would automatically run roscore if no current instance was already running. +As a result, the launch command either had to be run specifically on the computer that roscore was meant to run on, or other steps would need to be taken to launch roscore on a remote machine before running the roslauch command. +This could sometimes cause problems with systems running headlessly if the user wanted to interface with a client machine for launching and monitoring the system - the interface machine became a core component of the system, or the user had to ssh into the system's main machine to start it up which is what remote launching exists to do for you. +In ROS 2, nodes use DDS to connect in a peer-to-peer fashion with no centralized naming and registration services to have to start up. +[TODO] the next couple lines don't really beling in a "Differences from ROS 1" section, but they come about as consequences of the previous line. +However, the launch system in ROS 2 currently provides a LaunchService for processing LaunchDescriptions, including setting up event handlers. +These events will not be visible to event handlers running on other machines, without creating additional handlers and subsribers for publishing and receiving events over the wire. -## Purpose -Allow a system of ROS nodes to be launched on a hardware architecture that is spread across multiple networked computers and facilitate introspection and management of the system from a single machine. ## Goals -- Allow ROS2 nodes to be launched remotely over a network. -- Allow users to specify a networked machine on which to run a particular node. -- Allow users to create a list of host machines to feed into the launch system. -- Enable the launch system to use the list of hosts to distribute nodes or systems of nodes among. -- Allow load-balancing of nodes or systems of nodes among machines when nodes do not need to be tied to a specific device. -- Allow users to shutdown a system of ROS nodes remotely -- Provide tools for introspection of ROS systems distributed across multiple machines. -- Provide API to third-party orchestration tools. -- Provide mechanisms for automated recovery of failed nodes in a distributed system. +Our primary goal is to eliminate the need for users to connect to multiple machines and manually launch different components of a system on each of them independently. +[TODO] Extend this section by describing related goals of helping to keep files in sync across machines, facilitating initial setup and configuration, deployment of ROS packages, etc. and discussion about which to consider 'in scope'. ## Capabilities -In order to meet the above use cases, a launch system needs to have a number of capabilities, including: +In order to meet the above use goals, we will provide the following capabilities: - Connecting to a remote host and running nodes on it - Pushing configuration parameters for nodes to remote hosts -- Monitoring the status and tracking the lifecycle of nodes -- Recovering from failures by optionally restarting nodes -- Gracefully shutting down nodes on every host - -Most of these are just extensions of the design goals for the single-machine version of roslaunch for ROS2, but in addition to extending those goals to remote machines, we would also like to consider a few more advanced features, including: - -- Load balancing nodes on distributed networks +- Monitoring the status and managing the lifecycles of nodes across hosts +- Gracefully shutting down nodes across hosts - Command line tools for managing and monitoring systems across machines - Mechanisms for locating files and executables across machines +- A grouping mechanism allowing collections of nodes to be stopped/introspected as a unit with the commandline tools + +[TODO] These are capabilities that we might not want to keep in scope for remote launching and instead present them as a different PR +- API to facilitate integration of third party orchestration tools such as Kubernetes or Ansible +- Load balancing nodes on distributed networks (Possibly outsource this capability to the previously mentioned third-party tools) - Sharing and synchronizing files across machines +- Deployment and configuration of packages on remote machines ## Considerations @@ -88,6 +83,34 @@ There are some outstanding issues that may complicate things: ## Proposed Approach +Following are some of the possible design approaches we have started considering. +This section should evolve to describe a complete and homogenous solution as we iterate over time, but at the moment may be a bit piecemeal as we explore ideas. +The point is to capture all of our ideas and approaches to different pieces of the problem, even rejected approaches, and to facilitate discussion and maintain a record of our reasoning. + +### Simple Remote Process Execution + +Create an action in `launch` called `ExecuteRemoteProcess` that extends the `ExecuteProcess` action but includes parameters for the information needed to connect to a remote host and executes the process there. + +### Spawn Remote LaunchServers + +The LaunchServer is the process that, given a LaunchDescription, visits all of the constituent LaunchDescriptionEntities, triggering them to perform their functions. +Since the launch process involves more than simply executing nodes, it is unlikely that simply providing a way to execute nodes remotely will be adequate for starting non-trivial systems. +The LaunchServer is responsible for things such as setting environment variables, registering listeners, emitting events, filling out file and directory paths, declaring arguments, etc. +Remote machines will need to be made aware of any environment changes that are in-scope for nodes that they will be executing, and events may need to be handled across machines. + +[TODO] there is a lot of fleshing out that should be done here, but I want to get the general idea out for your consideration while I iterate further +One approach would be to add logic to the launch system allowing it to group LaunchDescriptionEntities containing the necessary actions and substitutions for successfully executing a node remotely, spawning a LaunchService on the remote machine, serializing the group of entities and sending them to the remote machine to be processed. +This could turn out to be a recursive process depending on how a launch file creator has nested LaunchDescriptionEntities (which can themselves be LaunchDescriptions). +Additional logic will be needed to detect cases where event emission and listener registration cross machine boundaries, and helper objects can be generated to forward events over the wire so handlers on other machines can react appropriately. + +### Integrate an existing Third-Party tool + +[TODO] This is mostly just placehodler text to remind us to talk about it. I don't have my head wrapped around what this would look like well enough to describe it yet. +I don't know exactly how this would look yet since I'm not very familiar with kubernetes, but it offers many of the capabilities we want plus more, and add mechanisms to facilitate its use could be very useful. +That said, it's a rather large dependency to add, and not everything should be run in containers. + +### ??? [TODO] Add any other ideas you have + ## Proposed Multi-Machine Launch Command Line Interface The multi-machine launching interface is controlled through the `launcher` command for the `ros2` command-line tool. From b5ecbf7919c2e1120b7b2b12cdc713d76558a4c4 Mon Sep 17 00:00:00 2001 From: Jacob Hassold Date: Wed, 25 Sep 2019 13:22:26 -0400 Subject: [PATCH 06/12] Fixed spelling error, made TODO more noticeable Distro A, OPSEC Signed-off-by: Jacob Hassold --- articles/152_roslaunch_multi_machine_launch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 54ed614b1..de38f08ab 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -38,7 +38,7 @@ Roslaunch would automatically run roscore if no current instance was already run As a result, the launch command either had to be run specifically on the computer that roscore was meant to run on, or other steps would need to be taken to launch roscore on a remote machine before running the roslauch command. This could sometimes cause problems with systems running headlessly if the user wanted to interface with a client machine for launching and monitoring the system - the interface machine became a core component of the system, or the user had to ssh into the system's main machine to start it up which is what remote launching exists to do for you. In ROS 2, nodes use DDS to connect in a peer-to-peer fashion with no centralized naming and registration services to have to start up. -[TODO] the next couple lines don't really beling in a "Differences from ROS 1" section, but they come about as consequences of the previous line. +*[TODO] the next couple lines don't really belong in a "Differences from ROS 1" section, but they come about as consequences of the previous line.* However, the launch system in ROS 2 currently provides a LaunchService for processing LaunchDescriptions, including setting up event handlers. These events will not be visible to event handlers running on other machines, without creating additional handlers and subsribers for publishing and receiving events over the wire. From 4e45a96254ef035b6410a4dc56e91414a2f4b4da Mon Sep 17 00:00:00 2001 From: Jacob Hassold Date: Wed, 25 Sep 2019 13:31:58 -0400 Subject: [PATCH 07/12] Formatting changes. Make visible again Distro A, OPSEC Signed-off-by: Jacob Hassold --- .../152_roslaunch_multi_machine_launch.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index de38f08ab..301f894c1 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -25,7 +25,7 @@ Authors: {{ page.author }} This document elaborates on the details of launching remote operating system processes alluded to [here](https://github.com/ros2/design/blob/gh-pages/articles/150_roslaunch.md#remote-operating-system-processes) in the main ROS 2 ros_launch design document. Robotic platforms often consist of multiple computers communicating over a network, and users will want to be able to start and stop the software on such systems without needing to manage each machine individually. -The launch system in ROS 1 included a tag for launch files that allowed users to include information about networked machines and how to connect so that processes could be started remotely. +The launch system in ROS 1 included a `` tag for launch files that allowed users to include information about networked machines and how to connect so that processes could be started remotely. We would like to replicate that capability in the launch system for ROS 2 and extend its capabilities based on lessons learned from working with multi-machine systems in ROS 1. ROS 2 also has a few notable design differences from ROS 1 that will affect the way multi-machine launching is implemented. @@ -33,13 +33,13 @@ ROS 2 also has a few notable design differences from ROS 1 that will affect the ### Differences from ROS 1 One of the most notable differences in ROS 2 is the lack of roscore. -In ROS 1, roscore adds a rosmaster, a parameter server, and a logging node when it is run. -Roslaunch would automatically run roscore if no current instance was already running. -As a result, the launch command either had to be run specifically on the computer that roscore was meant to run on, or other steps would need to be taken to launch roscore on a remote machine before running the roslauch command. +In ROS 1, roscore adds a `rosmaster`, a `parameter server`, and a `logging node` when it is run. +Roslaunch would automatically run `roscore` if no current instance was already running. +As a result, the `launch` command either had to be run specifically on the computer that roscore was meant to run on, or other steps would need to be taken to launch `roscore` on a remote machine before running the `roslauch` command. This could sometimes cause problems with systems running headlessly if the user wanted to interface with a client machine for launching and monitoring the system - the interface machine became a core component of the system, or the user had to ssh into the system's main machine to start it up which is what remote launching exists to do for you. In ROS 2, nodes use DDS to connect in a peer-to-peer fashion with no centralized naming and registration services to have to start up. *[TODO] the next couple lines don't really belong in a "Differences from ROS 1" section, but they come about as consequences of the previous line.* -However, the launch system in ROS 2 currently provides a LaunchService for processing LaunchDescriptions, including setting up event handlers. +However, the launch system in ROS 2 currently provides a `LaunchService` for processing `LaunchDescriptions`, including setting up event handlers. These events will not be visible to event handlers running on other machines, without creating additional handlers and subsribers for publishing and receiving events over the wire. @@ -47,7 +47,7 @@ These events will not be visible to event handlers running on other machines, wi ## Goals Our primary goal is to eliminate the need for users to connect to multiple machines and manually launch different components of a system on each of them independently. -[TODO] Extend this section by describing related goals of helping to keep files in sync across machines, facilitating initial setup and configuration, deployment of ROS packages, etc. and discussion about which to consider 'in scope'. +*[TODO] Extend this section by describing related goals of helping to keep files in sync across machines, facilitating initial setup and configuration, deployment of ROS packages, etc. and discussion about which to consider 'in scope'.* ## Capabilities @@ -61,7 +61,7 @@ In order to meet the above use goals, we will provide the following capabilities - Mechanisms for locating files and executables across machines - A grouping mechanism allowing collections of nodes to be stopped/introspected as a unit with the commandline tools -[TODO] These are capabilities that we might not want to keep in scope for remote launching and instead present them as a different PR +*[TODO] These are capabilities that we might not want to keep in scope for remote launching and instead present them as a different PR* - API to facilitate integration of third party orchestration tools such as Kubernetes or Ansible - Load balancing nodes on distributed networks (Possibly outsource this capability to the previously mentioned third-party tools) - Sharing and synchronizing files across machines @@ -93,19 +93,19 @@ Create an action in `launch` called `ExecuteRemoteProcess` that extends the `Exe ### Spawn Remote LaunchServers -The LaunchServer is the process that, given a LaunchDescription, visits all of the constituent LaunchDescriptionEntities, triggering them to perform their functions. +The `LaunchServer` is the process that, given a `LaunchDescription`, visits all of the constituent `LaunchDescriptionEntities`, triggering them to perform their functions. Since the launch process involves more than simply executing nodes, it is unlikely that simply providing a way to execute nodes remotely will be adequate for starting non-trivial systems. -The LaunchServer is responsible for things such as setting environment variables, registering listeners, emitting events, filling out file and directory paths, declaring arguments, etc. +The `LaunchServer` is responsible for things such as setting environment variables, registering listeners, emitting events, filling out file and directory paths, declaring arguments, etc. Remote machines will need to be made aware of any environment changes that are in-scope for nodes that they will be executing, and events may need to be handled across machines. -[TODO] there is a lot of fleshing out that should be done here, but I want to get the general idea out for your consideration while I iterate further -One approach would be to add logic to the launch system allowing it to group LaunchDescriptionEntities containing the necessary actions and substitutions for successfully executing a node remotely, spawning a LaunchService on the remote machine, serializing the group of entities and sending them to the remote machine to be processed. -This could turn out to be a recursive process depending on how a launch file creator has nested LaunchDescriptionEntities (which can themselves be LaunchDescriptions). +*[TODO] there is a lot of fleshing out that should be done here, but I want to get the general idea out for your consideration while I iterate further.* +One approach would be to add logic to the launch system allowing it to group `LaunchDescriptionEntities` containing the necessary actions and substitutions for successfully executing a node remotely, spawning a LaunchService on the remote machine, serializing the group of entities and sending them to the remote machine to be processed. +This could turn out to be a recursive process depending on how a launch file creator has nested `LaunchDescriptionEntities` (which can themselves be `LaunchDescriptions`). Additional logic will be needed to detect cases where event emission and listener registration cross machine boundaries, and helper objects can be generated to forward events over the wire so handlers on other machines can react appropriately. ### Integrate an existing Third-Party tool -[TODO] This is mostly just placehodler text to remind us to talk about it. I don't have my head wrapped around what this would look like well enough to describe it yet. +*[TODO] This is mostly just placehodler text to remind us to talk about it. I don't have my head wrapped around what this would look like well enough to describe it yet.* I don't know exactly how this would look yet since I'm not very familiar with kubernetes, but it offers many of the capabilities we want plus more, and add mechanisms to facilitate its use could be very useful. That said, it's a rather large dependency to add, and not everything should be run in containers. From b513248895dd2684d40b3314613ed1ac47772253 Mon Sep 17 00:00:00 2001 From: Jacob Hassold Date: Wed, 25 Sep 2019 13:36:14 -0400 Subject: [PATCH 08/12] Let there be lines. Distro A, OPSEC Signed-off-by: Jacob Hassold --- articles/152_roslaunch_multi_machine_launch.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 301f894c1..6a24282ff 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -81,6 +81,8 @@ There are some outstanding issues that may complicate things: - Need to keep track of which machine has the most recent version of such resources - Security: we'll need to manage credentials across numerous machines both for SSH and secure DDS. +--- + ## Proposed Approach Following are some of the possible design approaches we have started considering. @@ -111,6 +113,8 @@ That said, it's a rather large dependency to add, and not everything should be r ### ??? [TODO] Add any other ideas you have +--- + ## Proposed Multi-Machine Launch Command Line Interface The multi-machine launching interface is controlled through the `launcher` command for the `ros2` command-line tool. From bc31412f96b9a8f38a57054e561ea4170396703d Mon Sep 17 00:00:00 2001 From: "P. J. Reed" Date: Fri, 27 Sep 2019 15:16:02 -0500 Subject: [PATCH 09/12] Add info on custom remote execution mechanisms; update launch CLI syntax Distribution Statement A; OPSEC #2893 Signed-off-by: P. J. Reed --- .../152_roslaunch_multi_machine_launch.md | 106 ++++++++---------- 1 file changed, 49 insertions(+), 57 deletions(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 6a24282ff..730f8ec12 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -53,10 +53,11 @@ Our primary goal is to eliminate the need for users to connect to multiple machi In order to meet the above use goals, we will provide the following capabilities: -- Connecting to a remote host and running nodes on it -- Pushing configuration parameters for nodes to remote hosts -- Monitoring the status and managing the lifecycles of nodes across hosts -- Gracefully shutting down nodes across hosts +- Connect to a remote host and running nodes on it +- Support arbitrary remote execution or orchestration mechanisms (`ssh` by default) +- Push configuration parameters for nodes to remote hosts +- Monitor the status and managing the lifecycles of nodes across hosts +- Gracefully shut down nodes across hosts - Command line tools for managing and monitoring systems across machines - Mechanisms for locating files and executables across machines - A grouping mechanism allowing collections of nodes to be stopped/introspected as a unit with the commandline tools @@ -105,11 +106,18 @@ One approach would be to add logic to the launch system allowing it to group `La This could turn out to be a recursive process depending on how a launch file creator has nested `LaunchDescriptionEntities` (which can themselves be `LaunchDescriptions`). Additional logic will be needed to detect cases where event emission and listener registration cross machine boundaries, and helper objects can be generated to forward events over the wire so handlers on other machines can react appropriately. -### Integrate an existing Third-Party tool +### Define Remote Execution Mechanisms on a Per-Machine Basis -*[TODO] This is mostly just placehodler text to remind us to talk about it. I don't have my head wrapped around what this would look like well enough to describe it yet.* -I don't know exactly how this would look yet since I'm not very familiar with kubernetes, but it offers many of the capabilities we want plus more, and add mechanisms to facilitate its use could be very useful. -That said, it's a rather large dependency to add, and not everything should be run in containers. +Historically, ROS1 launched nodes by using `ssh` to connect to a remote machine and execute processes +on it. This is still a reasonable way of doing it and is the expected remote execution mechanism in most +environments. + +Some hosts or environments may use a different mechanism, such as Windows Remote Shell on Windows hosts +or `kubectl` for Kubernetes clusters. There will be an abstract interface for remote execution mechanisms; +it will be possible to write custom implementations that use arbitrary mechanisms, and the launch system +can be configured to decide which mechanism to use on a per-machine basis. When a launch system is run, +information about all of the nodes assigned to a machine will be passed to the remote execution mechanism +implementation so that it can execute them appropriately. ### ??? [TODO] Add any other ideas you have @@ -117,41 +125,18 @@ That said, it's a rather large dependency to add, and not everything should be r ## Proposed Multi-Machine Launch Command Line Interface -The multi-machine launching interface is controlled through the `launcher` command for the `ros2` command-line tool. -The existing `launch` command provides a subset of this functionality that is sufficient for single-machine launching. +Launching is controlled through the `launch` command for the `ros2` command-line tool. ### Commands ```bash -$ ros2 launcher -usage: ros2 launcher [-h] Call `ros2 launcher -h` for more detailed usage. ... - -Various launching related sub-commands - -optional arguments: - -h, --help show this help message and exit - -Commands: - launch Run a launch file - list Search for and list running launch systems - attach Attach to a running launch system and wait for it to finish - term Terminate a running launch system - - Call `ros2 launcher -h` for more detailed usage. -``` +$ ros2 launch +usage: ros2 launch (subcommand | [-h] [-d] [-D] [-p | -s] [-a] + package_name [launch_file_name] + [launch_arguments [launch_arguments ...]]) ... -#### `launch` - -The `ros2 launcher launch` is equivalent to `ros2 launch`, which is preserved for backwards compatibility and ease of use. -It is used to run a launch file. - -```bash -$ ros2 launcher launch -h -usage: ros2 launcher launch [-h] [-d] [-D] [-p | -s] [-a] - package_name [launch_file_name] - [launch_arguments [launch_arguments ...]] ... - -Run a launch file +Without a subcommand, `ros2 launch` will run a launch file. Call +`ros2 launch -h` for more detailed usage. positional arguments: package_name Name of the ROS package which contains the launch file @@ -164,7 +149,7 @@ optional arguments: -h, --help Show this help message and exit. -d, --debug Put the launch system in debug mode, provides more verbose output. -D, --detach Detach from the launch process after it has started. - -p, --print, --print-description + -p, --print, --print-description Print the launch description to the console without launching it. -s, --show-args, --show-arguments Show arguments that may be given to the launch file. @@ -172,12 +157,19 @@ optional arguments: Show all launched subprocesses' output by overriding their output configuration using the OVERRIDE_LAUNCH_PROCESS_OUTPUT envvar. + +Subcommands: + list Search for and list running launch systems + attach Attach to a running launch system and wait for it to finish + term Terminate a running launch system + + Call `ros2 launch -h` for more detailed usage. ``` Example output: ```bash -$ ros2 launcher launch demo_nodes_cpp talker_listener.launch.py +$ ros2 launch demo_nodes_cpp talker_listener.launch.py [INFO] [launch]: All log files can be found below /home/preed/.ros/log/2019-09-11-20-54-30-715383-regulus-2799 [INFO] [launch]: Default logging verbosity is set to INFO [INFO] [launch]: Launch System ID is 50bda6fb-d451-4d53-8a2b-e8fcdce8170b @@ -204,7 +196,7 @@ This is a unique identifier that can be used to track all of the nodes launched Additionally, it is possible to detach from a system and let it run in the background: ```bash -$ ros2 launcher launch -D demo_nodes_cpp talker_listener.launch.py +$ ros2 launch -D demo_nodes_cpp talker_listener.launch.py [INFO] [launch]: All log files can be found below /home/preed/.ros/log/2019-09-11-20-54-30-715383-regulus-2799 [INFO] [launch]: Default logging verbosity is set to INFO [INFO] [launch]: Launch System ID is 50bda6fb-d451-4d53-8a2b-e8fcdce8170b @@ -216,8 +208,8 @@ $ Since it is possible to launch a system of nodes that spans a network and detach from it, it is necessary to be able to query the network to find which systems are active. ```bash -$ ros2 launcher list -h -usage: ros2 launcher list [-h] [-v] [--spin-time SPIN_TIME] +$ ros2 launch list -h +usage: ros2 launch list [-h] [-v] [--spin-time SPIN_TIME] List running launch systems @@ -233,7 +225,7 @@ optional arguments: Example output: ```bash -$ ros2 launcher list +$ ros2 launch list ab1e0138-bb22-4ec9-a590-cf377de42d0f 50bda6fb-d451-4d53-8a2b-e8fcdce8170b 5d186778-1f50-4828-9425-64cc2ed1342c @@ -241,19 +233,19 @@ $ ``` ```bash -$ ros2 launcher list -v +$ ros2 launch list -v ab1e0138-bb22-4ec9-a590-cf377de42d0f: 5 nodes, 2 hosts Launch host: 192.168.10.5 Launch time: Fri Sep 13 15:39:45 CDT 2019 - Launch command: ros2 launcher launch package_foo bar.launch.py argument:=value + Launch command: ros2 launch package_foo bar.launch.py argument:=value 50bda6fb-d451-4d53-8a2b-e8fcdce8170b: 2 nodes, 1 host Launch host: 192.168.10.15 Launch time: Fri Sep 13 12:39:45 CDT 2019 - Launch command: ros2 launcher launch demo_nodes_cpp talker_listener.launch.py + Launch command: ros2 launch demo_nodes_cpp talker_listener.launch.py 5d186778-1f50-4828-9425-64cc2ed1342c: 16 nodes, 3 hosts Launch host: 192.168.10.13 Launch time: Fri Sep 12 10:39:45 CDT 2019 - Launch command: ros2 launcher launch package_foo bar2.launch.py + Launch command: ros2 launch package_foo bar2.launch.py $ ``` @@ -262,8 +254,8 @@ $ Since it is possible to detach from a launched system, it is useful for scripting or diagnostic purposes to be able to re-attach to it. ```bash -$ ros2 launcher attach -h -usage: ros2 launcher attach [-h] [-v] [--spin-time SPIN_TIME] [system_id] +$ ros2 launch attach -h +usage: ros2 launch attach [-h] [-v] [--spin-time SPIN_TIME] [system_id] Blocks until all nodes running under the specified Launch System ID have exited @@ -280,9 +272,9 @@ optional arguments: Example output: ```bash -$ ros2 launcher attach 50bda6fb-d451-4d53-8a2b-e8fcdce8170b +$ ros2 launch attach 50bda6fb-d451-4d53-8a2b-e8fcdce8170b Attached to Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b. -(... in another terminal, run `ros2 launcher term 50bda6fb`...) +(... in another terminal, run `ros2 launch term 50bda6fb`...) All nodes in Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b have exited. $ ``` @@ -290,12 +282,12 @@ $ Verbose mode: ```bash -$ ros2 launcher attach -v 50bda6fb +$ ros2 launch attach -v 50bda6fb Attached to Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b. Waiting for node /launch_ros Waiting for node /talker Waiting for node /listener -(... in another terminal, run `ros2 launcher term 50bda6fb`...) +(... in another terminal, run `ros2 launch term 50bda6fb`...) Node /launch_ros has exited Node /talker has exited Node /listener has exited @@ -308,8 +300,8 @@ $ Terminates all nodes that were launched under a specific Launch System ID. ```bash -$ ros2 launcher term -h -usage: ros2 launcher term [-h] [-v] [--spin-time SPIN_TIME] [system_id] +$ ros2 launch term -h +usage: ros2 launch term [-h] [-v] [--spin-time SPIN_TIME] [system_id] Terminates all nodes that were launched under a specific Launch System ID @@ -330,7 +322,7 @@ optional arguments: Example output: ```bash -$ ros2 launcher term 50bda6fb-d451-4d53-8a2b-e8fcdce8170b +$ ros2 launch term 50bda6fb-d451-4d53-8a2b-e8fcdce8170b Terminating Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b. $ ``` From 4ff0bed2d585c5a1736fe3ea58a673281e73f92f Mon Sep 17 00:00:00 2001 From: "matthew.lanting" Date: Tue, 1 Oct 2019 10:19:50 -0400 Subject: [PATCH 10/12] Re-worked Context and Goals sections. Distribution Statement A; OPSEC #2893 Signed-off-by: matthew.lanting --- .../152_roslaunch_multi_machine_launch.md | 44 +++++++------------ 1 file changed, 15 insertions(+), 29 deletions(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 730f8ec12..600897d8f 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -24,30 +24,24 @@ Authors: {{ page.author }} This document elaborates on the details of launching remote operating system processes alluded to [here](https://github.com/ros2/design/blob/gh-pages/articles/150_roslaunch.md#remote-operating-system-processes) in the main ROS 2 ros_launch design document. -Robotic platforms often consist of multiple computers communicating over a network, and users will want to be able to start and stop the software on such systems without needing to manage each machine individually. -The launch system in ROS 1 included a `` tag for launch files that allowed users to include information about networked machines and how to connect so that processes could be started remotely. -We would like to replicate that capability in the launch system for ROS 2 and extend its capabilities based on lessons learned from working with multi-machine systems in ROS 1. -ROS 2 also has a few notable design differences from ROS 1 that will affect the way multi-machine launching is implemented. - +## Goals -### Differences from ROS 1 +Our primary goal is to eliminate the need for users to connect to multiple machines and manually launch different components of a system on each of them independently. +The launch system in ROS 1 included a `` tag for launch files that allowed users to include information about networked machines and how to connect so that processes could be started remotely. +We would like to replicate that capability in the launch system for ROS 2. -One of the most notable differences in ROS 2 is the lack of roscore. -In ROS 1, roscore adds a `rosmaster`, a `parameter server`, and a `logging node` when it is run. -Roslaunch would automatically run `roscore` if no current instance was already running. -As a result, the `launch` command either had to be run specifically on the computer that roscore was meant to run on, or other steps would need to be taken to launch `roscore` on a remote machine before running the `roslauch` command. -This could sometimes cause problems with systems running headlessly if the user wanted to interface with a client machine for launching and monitoring the system - the interface machine became a core component of the system, or the user had to ssh into the system's main machine to start it up which is what remote launching exists to do for you. +We would like the launch system for ROS 2 to avoid becoming a single point of failure, while still having the capability to shut down the system as a whole on command. +In ROS 1, communication among nodes was facilitated by roscore which roslaunch would start automatically if no instance was already running. +As a result, the machine that roslaunch was run from became a core part of the system and the entire system would go down if it crashed or became disconnected. +This has been problematic on occasion when working with machines running headlessly and interfacing with a laptop. +The `launch` command either had to be run specifically on the computer that roscore was meant to run on, or other steps would need to be taken to launch `roscore` on a remote machine before running the `roslauch` command. In ROS 2, nodes use DDS to connect in a peer-to-peer fashion with no centralized naming and registration services to have to start up. -*[TODO] the next couple lines don't really belong in a "Differences from ROS 1" section, but they come about as consequences of the previous line.* -However, the launch system in ROS 2 currently provides a `LaunchService` for processing `LaunchDescriptions`, including setting up event handlers. -These events will not be visible to event handlers running on other machines, without creating additional handlers and subsribers for publishing and receiving events over the wire. - - -## Goals - -Our primary goal is to eliminate the need for users to connect to multiple machines and manually launch different components of a system on each of them independently. -*[TODO] Extend this section by describing related goals of helping to keep files in sync across machines, facilitating initial setup and configuration, deployment of ROS packages, etc. and discussion about which to consider 'in scope'.* +Other issues that we've dealt with on multi-machine systems include ensuring all the machines are properly configured and set up and keeping files and packages synchronized and up to date across machines. +These issues, while related to working with multiple machines, are a bit outside the scope of roslaunch. +There are a number of third-part orchestration tools, such as Kubernetes, that could be leveraged to get some of this extra functionality in addition to using them to facilitate execution of nodes, but we felt that would be too large of a dependency to require of people. +Resource constraind projects in particular don't need to be burdened with additional third-party tools, and some hardware architectures do not have strong Docker support. +It might however make sense to consider including an optional API to facilitate such third-party tools, or at the very least be mindful of them so we can avoid doing anything to make integrating them later too much more difficult. ## Capabilities @@ -59,10 +53,9 @@ In order to meet the above use goals, we will provide the following capabilities - Monitor the status and managing the lifecycles of nodes across hosts - Gracefully shut down nodes across hosts - Command line tools for managing and monitoring systems across machines -- Mechanisms for locating files and executables across machines - A grouping mechanism allowing collections of nodes to be stopped/introspected as a unit with the commandline tools -*[TODO] These are capabilities that we might not want to keep in scope for remote launching and instead present them as a different PR* +### Stretch-goals - API to facilitate integration of third party orchestration tools such as Kubernetes or Ansible - Load balancing nodes on distributed networks (Possibly outsource this capability to the previously mentioned third-party tools) - Sharing and synchronizing files across machines @@ -82,8 +75,6 @@ There are some outstanding issues that may complicate things: - Need to keep track of which machine has the most recent version of such resources - Security: we'll need to manage credentials across numerous machines both for SSH and secure DDS. ---- - ## Proposed Approach Following are some of the possible design approaches we have started considering. @@ -101,7 +92,6 @@ Since the launch process involves more than simply executing nodes, it is unlike The `LaunchServer` is responsible for things such as setting environment variables, registering listeners, emitting events, filling out file and directory paths, declaring arguments, etc. Remote machines will need to be made aware of any environment changes that are in-scope for nodes that they will be executing, and events may need to be handled across machines. -*[TODO] there is a lot of fleshing out that should be done here, but I want to get the general idea out for your consideration while I iterate further.* One approach would be to add logic to the launch system allowing it to group `LaunchDescriptionEntities` containing the necessary actions and substitutions for successfully executing a node remotely, spawning a LaunchService on the remote machine, serializing the group of entities and sending them to the remote machine to be processed. This could turn out to be a recursive process depending on how a launch file creator has nested `LaunchDescriptionEntities` (which can themselves be `LaunchDescriptions`). Additional logic will be needed to detect cases where event emission and listener registration cross machine boundaries, and helper objects can be generated to forward events over the wire so handlers on other machines can react appropriately. @@ -119,10 +109,6 @@ can be configured to decide which mechanism to use on a per-machine basis. When information about all of the nodes assigned to a machine will be passed to the remote execution mechanism implementation so that it can execute them appropriately. -### ??? [TODO] Add any other ideas you have - ---- - ## Proposed Multi-Machine Launch Command Line Interface Launching is controlled through the `launch` command for the `ros2` command-line tool. From 63b2a8c77b21e21ac7c3ccaac663e0715fb4e3a4 Mon Sep 17 00:00:00 2001 From: "P. J. Reed" Date: Wed, 2 Oct 2019 11:27:26 -0500 Subject: [PATCH 11/12] Add justification for attaching / detaching Distribution Statement A; OPSEC #2893 Signed-off-by: P. J. Reed --- articles/152_roslaunch_multi_machine_launch.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 600897d8f..13053ed75 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -189,6 +189,12 @@ $ ros2 launch -D demo_nodes_cpp talker_listener.launch.py $ ``` +A crucial difference here is that with ROS 1, the launch process was tied to the life of the system. If the process exited, that would also terminate all of the nodes it had launched. With ROS 2, the launch process can exit and leave the system running. + +One reason for this is that it was at odds with ROS 2's decentralized design paradigm. Nodes do not need a `rosmaster` to communicate and can operate on physical networks that disconnect from or reconnect to each other. Requiring a single `roslaunch` process that terminates the entire system when it exits is also introducing a single point of failure that had previously been avoided. + +A more practical reason is that in a multi-machine environment, it is often the case that the host doing the launching is not a critical part of the system and the rest of the system should not depend on it. A common use case is that on a vehicle that has several headless hosts for running ROS nodes, you will have a separate laptop for monitoring or controlling those hosts; you will want to be able to launch your system from that laptop, but the system should not terminate just because your laptop goes to sleep or disconnects from the network. The ROS 1 `roslaunch` system would require that `roslaunch` run on one of the vehicle hosts, and to do that you would need to either get remote shell access to one of them or write a custom set of services and launch scripts; in ROS 2, being able to detach from and reattach to a system makes that possible by design. + #### `list` Since it is possible to launch a system of nodes that spans a network and detach from it, it is necessary to be able to query the network to find which systems are active. @@ -208,7 +214,7 @@ optional arguments: applies when not using an already running daemon) ``` -Example output: +Here is a simple list that may be useful programmatically, but not so much for an end user: ```bash $ ros2 launch list @@ -218,6 +224,8 @@ ab1e0138-bb22-4ec9-a590-cf377de42d0f $ ``` +Here is a more verbose list that contains information a user can use to identify a system: + ```bash $ ros2 launch list -v ab1e0138-bb22-4ec9-a590-cf377de42d0f: 5 nodes, 2 hosts From 183d6502519c5a34d45c6b5a021a9729ab6d8295 Mon Sep 17 00:00:00 2001 From: "matthew.lanting" Date: Wed, 2 Oct 2019 14:27:15 -0400 Subject: [PATCH 12/12] Formatting fixes. Distribution Statement A; OPSEC #2893 Signed-off-by: matthew.lanting --- .../152_roslaunch_multi_machine_launch.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/articles/152_roslaunch_multi_machine_launch.md b/articles/152_roslaunch_multi_machine_launch.md index 13053ed75..d716ef2a9 100644 --- a/articles/152_roslaunch_multi_machine_launch.md +++ b/articles/152_roslaunch_multi_machine_launch.md @@ -93,21 +93,20 @@ The `LaunchServer` is responsible for things such as setting environment variabl Remote machines will need to be made aware of any environment changes that are in-scope for nodes that they will be executing, and events may need to be handled across machines. One approach would be to add logic to the launch system allowing it to group `LaunchDescriptionEntities` containing the necessary actions and substitutions for successfully executing a node remotely, spawning a LaunchService on the remote machine, serializing the group of entities and sending them to the remote machine to be processed. -This could turn out to be a recursive process depending on how a launch file creator has nested `LaunchDescriptionEntities` (which can themselves be `LaunchDescriptions`). +This could turn out to be a recursive process depending on how `LaunchDescriptionEntity`'s are nested. Additional logic will be needed to detect cases where event emission and listener registration cross machine boundaries, and helper objects can be generated to forward events over the wire so handlers on other machines can react appropriately. +LaunchServers would be the components with which the command line tools interact and need to have channels exposing information about the processes they've started, and for receiving user commands. + ### Define Remote Execution Mechanisms on a Per-Machine Basis -Historically, ROS1 launched nodes by using `ssh` to connect to a remote machine and execute processes -on it. This is still a reasonable way of doing it and is the expected remote execution mechanism in most -environments. +Historically, ROS1 launched nodes by using `ssh` to connect to a remote machine and execute processes on it. +This is still a reasonable way of doing it and is the expected remote execution mechanism in most environments. -Some hosts or environments may use a different mechanism, such as Windows Remote Shell on Windows hosts -or `kubectl` for Kubernetes clusters. There will be an abstract interface for remote execution mechanisms; -it will be possible to write custom implementations that use arbitrary mechanisms, and the launch system -can be configured to decide which mechanism to use on a per-machine basis. When a launch system is run, -information about all of the nodes assigned to a machine will be passed to the remote execution mechanism -implementation so that it can execute them appropriately. +Some hosts or environments may use a different mechanism, such as Windows Remote Shell on Windows hosts or `kubectl` for Kubernetes clusters. +There will be an abstract interface for remote execution mechanisms; +it will be possible to write custom implementations that use arbitrary mechanisms, and the launch system can be configured to decide which mechanism to use on a per-machine basis. +When a launch system is run, information about all of the nodes assigned to a machine will be passed to the remote execution mechanism implementation so that it can execute them appropriately. ## Proposed Multi-Machine Launch Command Line Interface