Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZMQ server bug? #301

Open
rawmean opened this issue Jan 12, 2025 · 5 comments
Open

ZMQ server bug? #301

rawmean opened this issue Jan 12, 2025 · 5 comments
Labels

Comments

@rawmean
Copy link

rawmean commented Jan 12, 2025

Almost once per 30 minutes, the fleet-telemetry crashes with one of the following errors.

Bad address (src/tcp.cpp:253)
double free or corruption (top)
free(): double free detected in tcache 2

The CPU usage and memory usage are both about ~15% on my machine.
This problem started when I added more vehicles.

@agbpatro
Copy link
Collaborator

@jordan-bonecutter would you have any guidance on how to debug this issue?

@rawmean
Copy link
Author

rawmean commented Jan 16, 2025

@jordan-bonecutter would you have any guidance on how to debug this issue?

It seems to be a race condition that happens only at scale (ie, when the number of vehicles that stream data goes above a certain level). ZMQ was working fine until I ramped up the number of vehicles.

The error is not related to the ZMQ client because the crash happens when where the client is stopped and no one is listening to the ZMQ port.

I also monitored the memory consumption and I don't think it's a memory leak problem either because memory consumption is stable.

I finally gave up and switched to PubSub and it's working fine.

@jordan-bonecutter
Copy link
Contributor

@jordan-bonecutter would you have any guidance on how to debug this issue?

It seems to be a race condition that happens only at scale (ie, when the number of vehicles that stream data goes above a certain level). ZMQ was working fine until I ramped up the number of vehicles.

The error is not related to the ZMQ client because the crash happens when where the client is stopped and no one is listening to the ZMQ port.

I also monitored the memory consumption and I don't think it's a memory leak problem either because memory consumption is stable.

I finally gave up and switched to PubSub and it's working fine.

Memory bugs can be weird, so I wouldn't rule out ZMQ per-se. I checked out tcp.cpp in the source on the given line and it isn't terribly interesting:

#if !defined(TARGET_OS_IPHONE) || !TARGET_OS_IPHONE
        errno_assert (errno != EACCES && errno != EBADF && errno != EDESTADDRREQ
                      && errno != EFAULT && errno != EISCONN
                      && errno != EMSGSIZE && errno != ENOMEM
                      && errno != ENOTSOCK && errno != EOPNOTSUPP);

which doesn't seem to be doing any free-ing of delete-ing to me. I will spend some time on this over the weekend but I have not myself run into this. @rawmean are you using the Dockerfile to build? I wonder if you're using a different version of ZMQ where this line is more interesting.

@rawmean
Copy link
Author

rawmean commented Jan 16, 2025

I didn't use Docker.
How many vehicles did you test it with? To test the crash problem I think you need to test it with at least 2000 vehicles.

The code that you shared: is that from tcp.cpp? It's unlikely that tcp.cpp to have a bug because it's used extensively everywhere. I'm surprised that it refers to iPhone target.

@jordan-bonecutter
Copy link
Contributor

Yeah, the bug won’t be in this line but we’ll be able to see what’s being freed and that could be super useful. I no longer work for the company that was using the API but we had roughly 100 vehicles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants