-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument client/service for end-to-end request/response tracking #145
Instrument client/service for end-to-end request/response tracking #145
Conversation
41fd9b1
to
19f30ef
Compare
I thought about it more and asked some people. I think that, even if it's "discouraged" or rare, because it can happen, it would be good to be able to differentiate multiple responses to the same request. |
c9240ff
to
0d89ee1
Compare
I added the source timestamp to the response send/take instrumentation. This is now ready for review. |
This adds support for the new client/service instrumentation in ROS 2, see ros2/ros2_tracing#145. 1. In the objects analysis, create client and service objects 2. In the messages analysis, create the following instances: 1. Request publication 2. Request take and callback 3. Response publication 4. Response take 5. Message transport for requests and responses 3. In the messages dataprovider, display the above instances Signed-off-by: Christophe Bedard <[email protected]>
This adds support for the new client/service (i.e., RPC) instrumentation in ROS 2, see ros2/ros2_tracing#145. 1. In the objects analysis, create client and service objects 2. In the messages analysis, create the following instances: 1. Request publication 2. Request take and callback 3. Response publication 4. Response take 5. Message transport for requests and responses 3. In the messages dataprovider, display the above instances There is one limitation. Normal message publications and message takes have instrumentation that provides a "start time." For example, for message publications, the `ros2:rclcpp_publish` tracepoint is the start and the `ros2:rmw_publish` tracepoint is the end of a message publication. This allows us to attribute a duration to the publication and therefore display a time graph state. However, we only have a single tracepoint for client/service-related publication/take instances, so we do not have any duration data. For now, just hardcode a 5000 ns duration so that time graph states are visible enough. Signed-off-by: Christophe Bedard <[email protected]>
If anyone wants to see how this instrumentation can be used, see this PR that adds support for this new instrumentation in Eclipse Trace Compass (see the included screenshot): eclipse-tracecompass-incubator/org.eclipse.tracecompass.incubator#127. |
@mjcarroll would you be willing to review this? |
Or maybe @clalancette or @fujitatomoya? |
@christophebedard i think i can give it a shot. |
@christophebedard overall looks good to me, i got several comments though. |
Signed-off-by: Christophe Bedard <[email protected]>
0d89ee1
to
394d7e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with green CI.
Pulls: #145, ros2/rmw_cyclonedds#521, ros2/rmw_fastrtps#787, ros2/rmw_connextdds#163 |
Alright, CI looks good and all PRs are approved. Merging all PRs. |
This adds support for the new client/service (i.e., RPC) instrumentation in ROS 2, see ros2/ros2_tracing#145. 1. In the objects analysis, create client and service objects 2. In the messages analysis, create the following instances: 1. Request publication 2. Request take and callback 3. Response publication 4. Response take 5. Message transport for requests and responses 3. In the messages dataprovider, display the above instances There is one limitation. Normal message publications and message takes have instrumentation that provides a "start time." For example, for message publications, the `ros2:rclcpp_publish` tracepoint is the start and the `ros2:rmw_publish` tracepoint is the end of a message publication. This allows us to attribute a duration to the publication and therefore display a time graph state. However, we only have a single tracepoint for client/service-related publication/take instances, so we do not have any duration data. For now, just hardcode a 5000 ns duration so that time graph states are visible enough. Signed-off-by: Christophe Bedard <[email protected]>
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-visualization-tools-for-architecture/41170/2 |
Resolves #143
This adds support for instrumenting clients and services for end-to-end tracking of requests/responses. This means:
The above is achieved with a combination of multiple instrumentation points, mostly in the
rmw
implementations:rmw_client_init
: On client creation, collect the client's GIDrmw_send_request
: On request publication, collect the client (indirectly corresponding to the client's GID) and the request's sequence numberrmw_take_request
: On request take, collect the request's client GID and sequence number, which can be matched to the published requestrmw_send_response
: On response publication, collect the client GID and the sequence number of the request that the response is for, which can be matched to the published/taken requestrmw_send_response
: On response publication, collect the source timestamp of the responsermw_take_response
: On response take, collect the client GID and the sequence number of the request that the response is for and the source timestamp of the response, which can be matched to the published/taken request and published responseSince clients (and services) are an
rmw
concept and not a DDS concept, selecting the "right" GID to use as the client GID and having it match when read from the client side and the service side was a bit tricky. Refer to thermw
implementation PRs (which need to be merged at the same time as this PR):This updates some tracing tests in
test_tracetools
and adds a new test:test_client
: basic validation of client tracepointstest_service
: basic validation of service tracepointstest_service_req_resp
: validate associations of requests/responsesThis also updates the design document.
Limitation:
A specific request is, by definition, only sent by a single client and can be received by N>=1 services, but it's still the same request, so there is no issue there. However, there can be multiple responses to the same request, made by multiple services. While having multiple services with the same name is "generally discouraged," it is possible: https://github.com/ros2/rmw/blob/33118c9d4dc2adec838962554f0e09ab5c15d1e0/rmw/include/rmw/rmw.h#L1970-L1972.
To differentiate two service responses to the same request, and therefore know which response the client ends up using, we collect the response's source timestamp when the response is sent and when it is taken. Depending on the clock resolution, we could have collisions here if two services send a response for the same request at the exact same time, but it's unlikely. This is similar to what we do to match a message being published to the same message being received (see #74 and related PRs/issues). We would ideally instead collect the service GID + response sequence number on both sides. However, this is not available for service responses, and DDS-based GIDs are known to be problematic for some
rmw
implementations: ros2/rmw_cyclonedds#377. This is why we simply use source timestamps for pub/sub messages.action-ros-ci-repos-override: https://gist.githubusercontent.com/christophebedard/cf4d163feeeabd2cd38cf694413127a0/raw/1fdc769133ff4c412b8283eebcca729f944f755c/ros2.repos