Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borrow_loaned_message() returns RETCODE_OUT_OF_RESOURCES when using Zero-copy #810

Open
djusten opened this issue Mar 5, 2025 · 5 comments
Assignees

Comments

@djusten
Copy link

djusten commented Mar 5, 2025

I'm configuring my system to use Zero-copy and shared memory, also it publishes a custom msg that has old plain type, that means all fields with fixed size.
My test consist of one publisher and four subscribers. The fourth subscriber I keep stopping it, that means shutdown() is called and after few seconds it starts again. After few times, the publisher started to return a error when calling borrow_loaned_message(). One observation I had is the publisher "Crashes" when the subscriber is starting and not when it stopped.

Could be the subscriber somehow isn't releasing some loan? What could be causing the borrow_loaned_message() running out of resource if the callback is doing nothing, neither doing any copy or holding the data.

Running ROS2 humble.

Publisher

rclcpp::init(argc, argv);
rclcpp::executors::MultiThreadedExecutor exec{};
rclcpp::NodeOptions options = rclcpp::NodeOptions{};
[...]
const auto qos = rclcpp::QoS(10).history(rclcpp::HistoryPolicy::KeepLast).keep_last(10).reliable();
[...]
try {
  auto loanedMsg = publisher_->borrow_loaned_message();
  publisher_->publish(std::move(loanedMsg));
}
catch(const std::exception &e)
{
            std::cerr << "---------->" << e.what() << '\n';

}
[...]
exec.spin();
rclcpp::shutdown();

Subscriber

[...]
auto qosSub = rclcpp::QoS(10).history(rclcpp::HistoryPolicy::KeepLast).keep_last(10).reliable();
[...]
rclcpp::init(argc, argv);
rclcpp::executors::MultiThreadedExecutor exec{};
rclcpp::NodeOptions options = rclcpp::NodeOptions{};
[...]
void cb(const typename ros2_interface::msg::Image8m::UniquePtr msg)
{
//Doing nothing here;
}
[...]
exec.spin();
rclcpp::shutdown();

The environment variables are:

FASTRTPS_DEFAULT_PROFILES_FILE=/etc/ros_shm_profile.xml
RMW_IMPLEMENTATION=rmw_fastrtps_cpp
RMW_FASTRTPS_USE_QOS_FROM_XML=1
ROS_DISABLE_LOANED_MESSAGES=0
ROS_LOCALHOST_ONLY=1

XML profile:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <!-- Create a descriptor for the new transport -->
        <transport_descriptor>
            <transport_id>shm_transport_only</transport_id>
            <type>SHM</type>
            <segment_size>6912512</segment_size>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="SHMParticipant" is_default_profile="true">
        <rtps>
            <!-- Link the Transport Layer to the Participant -->
            <userTransports>
                <transport_id>shm_transport_only</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>
</profiles>

Actually free_payloads_ is getting empty, but why if the subscriber is doing nothing with the data and why it isn't being released?

@fujitatomoya
Copy link
Collaborator

IMO,

1st of all, you are not using Data Sharing delivery nor ROS 2 LoanedMessage.
for humble, we need to explicitly enable data sharing to enable LoandedMessage, see https://github.com/ros2/rmw_fastrtps?tab=readme-ov-file#enable-zero-copy-data-sharing.

borrow_loaned_message() returns RETCODE_OUT_OF_RESOURCES when using Zero-copy

i do not understand what this means.
borrow_loaned_message() returns rclcpp::LoanedMessage<ROSMessageType, AllocatorT>, what method actually returns RETCODE_OUT_OF_RESOURCES?

this problem seems to be related to Shared Memory Transport, can you provide self-contained/complete example to reproduce the issue?

CC: @MiguelCompany

@djusten
Copy link
Author

djusten commented Mar 6, 2025

Hi @fujitatomoya, thank you for your very quick reply.
I can't agree with what you mentioned above

1st of all, you are not using Data Sharing delivery nor ROS 2 LoanedMessage.

As I mentioned above, my data type is Plain Old Data, the variables required FASTRTPS_DEFAULT_PROFILES_FILE, RMW_FASTRTPS_USE_QOS_FROM_XML=1 and ROS_DISABLE_LOANED_MESSAGES=0 are set as in the documentation. When using my main application (not this test app), I see the CPU load dropping down drastically when setting those variables, that means, it proofs the usage of zero-copy.
In case Zero-copy can't be used e.g. when NOT using Plain Old Data (e.g. sensor_msgs::msg::Image), the publisher generate a warning [INFO] [1741251000.580629312] [rclcpp]: Currently used middleware can't loan messages. Local allocator will be used., this is NOT happening, that means zero-copy is being used in my test application.
Also using the following XML profile, I see the same behavior:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <!-- Block gotten from https://fast-dds.docs.eprosima.com/en/v2.6.7/fastdds/transport/shared_memory/shared_memory.html#transport-sharedmemory-example -->
    <transport_descriptors>
        <!-- Create a descriptor for the new transport -->
        <transport_descriptor>
            <transport_id>shm_transport_only</transport_id>
            <type>SHM</type>
            <segment_size>6912512</segment_size>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="SHMParticipant">
        <rtps>
            <!-- Link the Transport Layer to the Participant -->
            <userTransports>
                <transport_id>shm_transport_only</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>

    <!-- Block gotten from https://github.com/ros2/rmw_fastrtps/tree/humble?tab=readme-ov-file#enable-zero-copy-data-sharing -->
    <!-- Default publisher profile -->
    <publisher profile_name="default publisher profile" is_default_profile="true">
        <qos>
            <data_sharing>
                <kind>AUTOMATIC</kind>
            </data_sharing>
        </qos>
        <historyMemoryPolicy>PREALLOCATED_WITH_REALLOC</historyMemoryPolicy>
    </publisher>

    <!-- Default subscriber profile -->
    <subscriber profile_name="default subscriber profile" is_default_profile="true">
        <qos>
            <data_sharing>
                <kind>AUTOMATIC</kind>
            </data_sharing>
        </qos>
        <historyMemoryPolicy>PREALLOCATED_WITH_REALLOC</historyMemoryPolicy>
    </subscriber>
</profiles>

i do not understand what this means.
borrow_loaned_message() returns rclcpp::LoanedMessage<ROSMessageType, AllocatorT>, what method actually returns RETCODE_OUT_OF_RESOURCES?

In case borrow_loaned_message(); is inside a try{} catch (const std::exception &e), and borrow_loaned_message() fails it will throw ---------->error not set, that doesn't help much, right? I went a bit forward and could see loan_sample() is returning OUT_OF_RESOURCE here https://github.com/eProsima/Fast-DDS/blob/2.6.7/src/cpp/fastdds/publisher/DataWriterImpl.cpp#L440.

My test consist in running 1 publisher and 4 subscriber. The forth subscriber I keep stopping (with CTRL+C) and running it again. After few times, the publisher stops sending data due no resource for loan.
Publisher

#include <chrono>
#include <functional>
#include <memory>
#include <string>

#include "rclcpp/rclcpp.hpp"
#include "ros2_interface/msg/image8m.hpp"
#include "ros2_interface/msg/string.hpp"
#include <sensor_msgs/msg/image.hpp>
#include "std_msgs/msg/u_int8.hpp"

using namespace std::chrono_literals;
#define QUEUE_MSG_SIZE 10

class ImagePublisher : public rclcpp::Node
{
public:
    using pubType = ros2_interface::msg::Image8m;
    ImagePublisher(std::string nodeName, std::string topic) : Node(nodeName), count_(0)
    {
        const auto qos = rclcpp::QoS(QUEUE_MSG_SIZE).history(rclcpp::HistoryPolicy::KeepLast).keep_last(QUEUE_MSG_SIZE).reliable();
        publisher_ = this->create_publisher<pubType>(topic, qos);

        timer_ = this->create_wall_timer(
            std::chrono::milliseconds(33), // ~30 FPS
            std::bind(&ImagePublisher::timer_callback, this));
    }

private:
    void timer_callback()
    {
        try
        {
            auto loanedMsg = publisher_->borrow_loaned_message();

            static int count = 0;
            loanedMsg.get().header.sequence_count = count++;
            std::cout << "publishing " << count << std::endl;
            publisher_->publish(std::move(loanedMsg));
        }
        catch (const std::exception &e)
        {
            std::cerr << "---------->" << e.what() << '\n';
            static int count = 0;
            if (count++ == 10)
            {
                std::cerr << "----------> EXIT" << '\n';
                rclcpp::shutdown();
                exit(1);
            }
        }
    }
    rclcpp::TimerBase::SharedPtr timer_;
    rclcpp::Publisher<pubType>::SharedPtr publisher_;
    size_t count_;
};

int main(int argc, char *argv[])
{
    if (argc < 3)
    {
        std::cout << "./main node_name topic_name" << std::endl;
        return -1;
    }

    rclcpp::init(argc, argv);
    rclcpp::spin(std::make_shared<ImagePublisher>(argv[1], argv[2]));
    std::cout << "Calling shutdown" << std::endl;
    rclcpp::shutdown();
    return 0;
}

Subscriber

#include <memory>
#include <rclcpp/rclcpp.hpp>
#include <rclcpp/serialization.hpp>
#include <cstring>
#include <memory>
#include <sensor_msgs/image_encodings.hpp>
#include <sensor_msgs/msg/image.hpp>
#include "ros2_interface/msg/image8m.hpp"
#include "ros2_interface/msg/header.hpp"

using namespace std::chrono_literals;

#define QUEUE_MSG_SIZE 10
class ImageSubscriber : public rclcpp::Node
{
public:
    using Topic = ros2_interface::msg::Image8m;

    explicit ImageSubscriber(rclcpp::NodeOptions const &options, std::string node, std::string topic)
        : rclcpp::Node{node, options}
    {
        auto qosSub = rclcpp::QoS(QUEUE_MSG_SIZE).history(rclcpp::HistoryPolicy::KeepLast).keep_last(QUEUE_MSG_SIZE).reliable();
        img_sub_ = this->create_subscription<ros2_interface::msg::Image8m>(
            topic,
            qosSub,
            std::bind(&ImageSubscriber::cb, this, std::placeholders::_1));
    }

    ~ImageSubscriber()
    {
    }

private:
    void cb(const typename ros2_interface::msg::Image8m::UniquePtr msg)
    {
        static uint64_t last_seq;
        std::cout << "New msg sequence_count=" << msg->header.sequence_count << '\n';
        if ((last_seq + 1) != (uint64_t)msg->header.sequence_count)
        {
            std::cerr << "----------------------------ERROR wrong sequence count: " << last_seq << " != " << msg->header.sequence_count << std::endl;
        }
        last_seq = msg->header.sequence_count;
    }

    rclcpp::Subscription<ros2_interface::msg::Image8m>::SharedPtr img_sub_;

};

int main(int argc, char **argv)
{
    if (argc < 3)
    {
        std::cout << "./main node_name topic" << std::endl;
        return -1;
    }
    rclcpp::init(argc, argv);

    rclcpp::executors::MultiThreadedExecutor exec{};
    rclcpp::NodeOptions options = rclcpp::NodeOptions{};

    // auto imageSubscriber = std::make_shared<Test>(options, argv[1]);
    auto imageSubscriber = std::make_shared<ImageSubscriber>(options, argv[1], argv[2]);
    exec.add_node(imageSubscriber);

    exec.spin();

    rclcpp::shutdown();

    std::cerr << ">>>>>>>>>>>>>>>>>>>>>>>>> shutdown success" << std::endl;
    imageSubscriber = nullptr;

    return 0;
}

Hope it could clarify a bit more.
Bests,
Diogo

@fujitatomoya
Copy link
Collaborator

As I mentioned above, my data type is Plain Old Data, the variables required FASTRTPS_DEFAULT_PROFILES_FILE, RMW_FASTRTPS_USE_QOS_FROM_XML=1 and ROS_DISABLE_LOANED_MESSAGES=0 are set as in the documentation.

you are right, all these are required condition for zero copy.

Also using the following XML profile, I see the same behavior:

important configuration for data sharing delivery in Fast-DDS with ROS 2 humble is the following.
this part was missing in the original issue post, so i expected that ROS2 LoanedMessage cannot be enabled.

            <data_sharing>
                <kind>AUTOMATIC</kind>
            </data_sharing>

I went a bit forward and could see loan_sample() is returning OUT_OF_RESOURCE here https://github.com/eProsima/Fast-DDS/blob/2.6.7/src/cpp/fastdds/publisher/DataWriterImpl.cpp#L440.

I see. and thanks for providing the samples, i am interested in this. i will try to take a look.

thanks,

@djusten
Copy link
Author

djusten commented Mar 21, 2025

Hi @fujitatomoya, were you able to reproduce the issue? Is there anything else I can help on setting up the environment to reproduce it? Thanks.

@fujitatomoya
Copy link
Collaborator

@djusten i could not allocate time, i will try in this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants