Traffic schedule node functionality and its fail over #91

JGUO006 · 2021-08-23T10:54:48Z

JGUO006
Aug 23, 2021

Hi,

According to a previous discussion on Highy available open-rmf and adapters #78 here, I followed the instructions mention in the merged feature to support seamless fail over for the traffic schedule node.

In redundancy functionality test, I killed the primary traffic schedule node and observed from console output that the death of the primary node was detected and handled by the monitor. In Rviz, the traffic schedule continued to be updated, particularly at the end of loops when a new route was posted.

However, I tested further at this point by killing the monitor node. I observed that robots were still running and tasks continued to be running until last loop was completed. The only difference is that the green planned trajectory path was missing in Rviz. Furthermore, I sent another 2 new tasks and found they could start running even without any traffic schedule node. Then, I manually created a traffic schedule node by using ros2 run command in terminal, and found that the planned trajectory path came back online in Rviz and tasks completed running without any issue.

Thus, this test makes curious about the functionality of traffic schedule node. What does this node do in rmf? Why the tasks can still be handled by rmf without traffic schedule node in this test? Which node is handling it and planning the path? Thank you!

mxgrey · 2021-08-23T12:40:57Z

mxgrey
Aug 23, 2021
Maintainer

Everything you've described is exactly what I would expect to happen based on the design of RMF. So I'll just go ahead and clear up why you saw this behavior:

Task management and robot control run independently from the traffic schedule. That means if a fleet has some tasks to perform, the fleet adapter can go ahead and command robots to perform those tasks regardless of whether the traffic schedule is running.

The roles of the traffic schedule node are:

Track all the robots' predictions of what their trajectories will be
Redistribute those predictions to everyone who is listening (e.g. RViz or other robots)
Identify when conflicts come up between the predictions of different robots (e.g. two robots will collide if they follow their predictions)
Notify robots when their schedules have conflicts with other robots

When the schedule node is down, you only lose the features mentioned above. That means the robots can keep performing their tasks, but they won't know about what other robots are doing, so they can't plan around each other. If they run into a traffic conflict, then they won't know how to resolve it because they won't know who they're conflicting with or why. But if they get lucky and there are no traffic conflicts, then they can just keep operating as normal.

The motive for this design is so that the robots don't immediately come to a grinding halt just because the traffic schedule goes down or experiences latency. The robots should just keep operating, and when a monitor node activates or a new schedule node is spun up, everything will continue working seamlessly as if no problem occurred at all.

As for spinning up a new schedule node in the middle of a run, this should usually work fine, especially if you're spinning up the new node on the same computer that the previous one was running on. With the current implementation there is some risk of "losing" participants when spinning up a new schedule node if the previous schedule node somehow crashed immediately after registering a new participant. However if you can anticipate all participants ahead of time and fill them into your .rmf_schedule_node.yaml then there's no risk of this issue, and you can always hot start the traffic schedule node. We also have some plans to improve the registering system so that you can always safely hot start a traffic schedule node, but that's for future development.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traffic schedule node functionality and its fail over #91

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Traffic schedule node functionality and its fail over #91

JGUO006 Aug 23, 2021

Replies: 1 comment

mxgrey Aug 23, 2021 Maintainer

JGUO006
Aug 23, 2021

mxgrey
Aug 23, 2021
Maintainer