Instrument client/service for end-to-end request/response tracking #145

christophebedard · 2024-11-17T23:00:59Z

Resolves #143

This adds support for instrumenting clients and services for end-to-end tracking of requests/responses. This means:

Associating a request sent by a client to the same request received by a server
Associating a response sent by a server to the original request sent by a client
Associating a response sent by a server to the same response received by a client
- There is a slight limitation here, see below

The above is achieved with a combination of multiple instrumentation points, mostly in the rmw implementations:

- rmw_client_init: On client creation, collect the client's GID
- rmw_send_request: On request publication, collect the client (indirectly corresponding to the client's GID) and the request's sequence number
- rmw_take_request: On request take, collect the request's client GID and sequence number, which can be matched to the published request
- rmw_send_response: On response publication, collect the client GID and the sequence number of the request that the response is for, which can be matched to the published/taken request
- rmw_send_response: On response publication, collect the source timestamp of the response
- rmw_take_response: On response take, collect the client GID and the sequence number of the request that the response is for and the source timestamp of the response, which can be matched to the published/taken request and published response
  - There is a slight limitation here, see below

Since clients (and services) are an rmw concept and not a DDS concept, selecting the "right" GID to use as the client GID and having it match when read from the client side and the service side was a bit tricky. Refer to the rmw implementation PRs (which need to be merged at the same time as this PR):

This updates some tracing tests in test_tracetools and adds a new test:

test_client: basic validation of client tracepoints
test_service: basic validation of service tracepoints
test_service_req_resp: validate associations of requests/responses

This also updates the design document.

Limitation:

A specific request is, by definition, only sent by a single client and can be received by N>=1 services, but it's still the same request, so there is no issue there. However, there can be multiple responses to the same request, made by multiple services. While having multiple services with the same name is "generally discouraged," it is possible: https://github.com/ros2/rmw/blob/33118c9d4dc2adec838962554f0e09ab5c15d1e0/rmw/include/rmw/rmw.h#L1970-L1972.

To differentiate two service responses to the same request, and therefore know which response the client ends up using, we collect the response's source timestamp when the response is sent and when it is taken. Depending on the clock resolution, we could have collisions here if two services send a response for the same request at the exact same time, but it's unlikely. This is similar to what we do to match a message being published to the same message being received (see #74 and related PRs/issues). We would ideally instead collect the service GID + response sequence number on both sides. However, this is not available for service responses, and DDS-based GIDs are known to be problematic for some rmw implementations: ros2/rmw_cyclonedds#377. This is why we simply use source timestamps for pub/sub messages.

action-ros-ci-repos-override: https://gist.githubusercontent.com/christophebedard/cf4d163feeeabd2cd38cf694413127a0/raw/1fdc769133ff4c412b8283eebcca729f944f755c/ros2.repos

christophebedard · 2024-11-18T22:07:31Z

All of this to say: would it be worth also collecting source timestamps of sent/taken responses to differentiate multiple responses to the same request?

I thought about it more and asked some people. I think that, even if it's "discouraged" or rare, because it can happen, it would be good to be able to differentiate multiple responses to the same request.

christophebedard · 2024-11-23T18:47:06Z

I added the source timestamp to the response send/take instrumentation.

This is now ready for review.

This adds support for the new client/service instrumentation in ROS 2, see ros2/ros2_tracing#145. 1. In the objects analysis, create client and service objects 2. In the messages analysis, create the following instances: 1. Request publication 2. Request take and callback 3. Response publication 4. Response take 5. Message transport for requests and responses 3. In the messages dataprovider, display the above instances Signed-off-by: Christophe Bedard <[email protected]>

This adds support for the new client/service (i.e., RPC) instrumentation in ROS 2, see ros2/ros2_tracing#145. 1. In the objects analysis, create client and service objects 2. In the messages analysis, create the following instances: 1. Request publication 2. Request take and callback 3. Response publication 4. Response take 5. Message transport for requests and responses 3. In the messages dataprovider, display the above instances There is one limitation. Normal message publications and message takes have instrumentation that provides a "start time." For example, for message publications, the `ros2:rclcpp_publish` tracepoint is the start and the `ros2:rmw_publish` tracepoint is the end of a message publication. This allows us to attribute a duration to the publication and therefore display a time graph state. However, we only have a single tracepoint for client/service-related publication/take instances, so we do not have any duration data. For now, just hardcode a 5000 ns duration so that time graph states are visible enough. Signed-off-by: Christophe Bedard <[email protected]>

christophebedard · 2024-11-24T21:29:23Z

If anyone wants to see how this instrumentation can be used, see this PR that adds support for this new instrumentation in Eclipse Trace Compass (see the included screenshot): eclipse-tracecompass-incubator/org.eclipse.tracecompass.incubator#127.

christophebedard · 2024-11-27T18:41:25Z

@mjcarroll would you be willing to review this?

christophebedard · 2024-12-05T15:34:59Z

Or maybe @clalancette or @fujitatomoya?

fujitatomoya · 2024-12-05T18:34:18Z

@christophebedard i think i can give it a shot.

test_tracetools/test/test_client.py

tracetools/include/tracetools/tp_call.h

tracetools/include/tracetools/tracetools.h

fujitatomoya · 2024-12-08T20:48:38Z

@christophebedard overall looks good to me, i got several comments though.

Signed-off-by: Christophe Bedard <[email protected]>

fujitatomoya

lgtm with green CI.

christophebedard · 2024-12-12T15:57:38Z

Pulls: #145, ros2/rmw_cyclonedds#521, ros2/rmw_fastrtps#787, ros2/rmw_connextdds#163
Gist: https://gist.githubusercontent.com/christophebedard/6de7664f9c4b0103faca7eff9ae7a757/raw/f3665ac1140f461336f4c2444f38bacfcc8f44d8/ros2.repos
BUILD args: --packages-above-and-dependencies test_tracetools tracetools tracetools_trace rmw_cyclonedds_cpp rmw_fastrtps_cpp rmw_fastrtps_dynamic_cpp rmw_fastrtps_shared_cpp rmw_connextdds_common
TEST args: --packages-above test_tracetools tracetools tracetools_trace rmw_cyclonedds_cpp rmw_fastrtps_cpp rmw_fastrtps_dynamic_cpp rmw_fastrtps_shared_cpp rmw_connextdds_common
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14947

Linux
Linux-aarch64
Linux-rhel
Windows

christophebedard · 2024-12-12T16:12:43Z

I had to manually trigger ci_launcher because CI_USE_FASTRTPS_DYNAMIC needs to be enabled, and Connext DDS is not available on ci_linux-aarch64:

Linux
Linux-aarch64
Linux-rhel
Windows

christophebedard · 2024-12-14T20:11:02Z

Alright, CI looks good and all PRs are approved. Merging all PRs.

This adds support for the new client/service (i.e., RPC) instrumentation in ROS 2, see ros2/ros2_tracing#145. 1. In the objects analysis, create client and service objects 2. In the messages analysis, create the following instances: 1. Request publication 2. Request take and callback 3. Response publication 4. Response take 5. Message transport for requests and responses 3. In the messages dataprovider, display the above instances There is one limitation. Normal message publications and message takes have instrumentation that provides a "start time." For example, for message publications, the `ros2:rclcpp_publish` tracepoint is the start and the `ros2:rmw_publish` tracepoint is the end of a message publication. This allows us to attribute a duration to the publication and therefore display a time graph state. However, we only have a single tracepoint for client/service-related publication/take instances, so we do not have any duration data. For now, just hardcode a 5000 ns duration so that time graph states are visible enough. Signed-off-by: Christophe Bedard <[email protected]>

ros-discourse · 2024-12-18T16:24:01Z

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-visualization-tools-for-architecture/41170/2

christophebedard added the enhancement New feature or request label Nov 17, 2024

christophebedard self-assigned this Nov 17, 2024

christophebedard force-pushed the christophebedard/instrument-services-end-to-end branch from 41fd9b1 to 19f30ef Compare November 17, 2024 23:11

christophebedard force-pushed the christophebedard/instrument-services-end-to-end branch 2 times, most recently from c9240ff to 0d89ee1 Compare November 23, 2024 18:38

christophebedard marked this pull request as ready for review November 23, 2024 18:46

christophebedard mentioned this pull request Nov 24, 2024

ros2: support client/service instrumentation eclipse-tracecompass-incubator/org.eclipse.tracecompass.incubator#127

Merged

fujitatomoya reviewed Dec 8, 2024

View reviewed changes

test_tracetools/test/test_client.py Outdated Show resolved Hide resolved

tracetools/include/tracetools/tp_call.h Show resolved Hide resolved

tracetools/include/tracetools/tracetools.h Show resolved Hide resolved

Instrument client/service for end-to-end request/response tracking

394d7e0

Signed-off-by: Christophe Bedard <[email protected]>

christophebedard force-pushed the christophebedard/instrument-services-end-to-end branch from 0d89ee1 to 394d7e0 Compare December 9, 2024 22:35

fujitatomoya approved these changes Dec 10, 2024

View reviewed changes

christophebedard merged commit 699f572 into rolling Dec 14, 2024
8 of 9 checks passed

christophebedard deleted the christophebedard/instrument-services-end-to-end branch December 14, 2024 20:11

christophebedard mentioned this pull request Dec 16, 2024

Add tracing instrumentation using tracetools ros2/rmw_zenoh#294

Merged

christophebedard mentioned this pull request Dec 21, 2024

Do not include service/client tracepoints for Jazzy ros2/rmw_zenoh#355

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument client/service for end-to-end request/response tracking #145

Instrument client/service for end-to-end request/response tracking #145

christophebedard commented Nov 17, 2024 •

edited

Loading

christophebedard commented Nov 18, 2024

christophebedard commented Nov 23, 2024

christophebedard commented Nov 24, 2024

christophebedard commented Nov 27, 2024

christophebedard commented Dec 5, 2024

fujitatomoya commented Dec 5, 2024

fujitatomoya commented Dec 8, 2024

fujitatomoya left a comment

christophebedard commented Dec 12, 2024

christophebedard commented Dec 12, 2024 •

edited

Loading

christophebedard commented Dec 14, 2024

ros-discourse commented Dec 18, 2024

Instrument client/service for end-to-end request/response tracking #145

Instrument client/service for end-to-end request/response tracking #145

Conversation

christophebedard commented Nov 17, 2024 • edited Loading

christophebedard commented Nov 18, 2024

christophebedard commented Nov 23, 2024

christophebedard commented Nov 24, 2024

christophebedard commented Nov 27, 2024

christophebedard commented Dec 5, 2024

fujitatomoya commented Dec 5, 2024

fujitatomoya commented Dec 8, 2024

fujitatomoya left a comment

Choose a reason for hiding this comment

christophebedard commented Dec 12, 2024

christophebedard commented Dec 12, 2024 • edited Loading

christophebedard commented Dec 14, 2024

ros-discourse commented Dec 18, 2024

christophebedard commented Nov 17, 2024 •

edited

Loading

christophebedard commented Dec 12, 2024 •

edited

Loading