Skip to content

Conversation

@dgottlieb
Copy link
Member

@dgottlieb dgottlieb commented Nov 21, 2025

Starting a conversation via a PR. Apologies if this isn't the best place.

We got a report of crashing once upgrading v101/v102. This was a pretty clear OOM on the first access of motion planning on a lite6 arm (read: building the cache). This was on one of the nanos I think? With 8GB of RAM that's shared with the GPU. I believe a CV model was also loaded.

The reason for this increased consumption was the finer grained jog creating more cache entries. Here are some numbers for memory used by the cache (after forcing a GC) across different versions (Alloc being actively used memory):

v0.102.0
Alloc = 2861 MiB	TotalAlloc = 10051 MiB	Sys = 5543 MiB	NumGC = 23

v0.101.0
Alloc = 2379 MiB	TotalAlloc = 7350 MiB	Sys = 4714 MiB	NumGC = 20

v0.100.0
Alloc = 253 MiB	TotalAlloc = 1756 MiB	Sys = 619 MiB	NumGC = 19

On main which removed the unused pose memory was cut by ~35% from v101/v102. But still greatly increased from v100:

Main:
Alloc = 1648 MiB	TotalAlloc = 5170 MiB	Sys = 3785 MiB	NumGC = 13

This patch brings memory back in line with the v100:

arm6JogRatios   = []float64{90, 32, 8, 8, 4, 2}
Alloc = 227 MiB	TotalAlloc = 1376 MiB	Sys = 562 MiB	NumGC = 15

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Nov 21, 2025

var (
arm6JogRatios = []float64{360, 32, 8, 8, 4, 2}
arm6JogRatios = []float64{90, 16, 8, 8, 4, 2}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An "enhanced" version of this patch is to query how much memory the machine has. Such that we're more conservative with low-memory rpis while getting better precision with bigger boxes.

Happy move this PR in that direction. Or any other direction. Just wanted to get product feedback/a path forward for users unable to upgrade.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who can't upgrade?
This will make a lot of things worse and isn't the direction I think we should go.
We can just not cache at all if the system has less than x amount of ram perhaps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lame - is it easy enough to see available memory?

@dgottlieb dgottlieb requested a review from erh November 21, 2025 15:13
@github-actions
Copy link
Contributor

Availability

Scene # viamrobotics:main dgottlieb:RSDK-12719 Percent Improvement Health
1 100% 100% 0%
2 100% 100% 0%
3 100% 100% 0%
4 100% 100% 0%
5 100% 100% 0%
6 100% 100% 0%
7 100% 100% 0%
8 100% 100% 0%
9 100% 100% 0%
10 100% 100% 0%

Quality

Scene # viamrobotics:main dgottlieb:RSDK-12719 Percent Improvement Probability of Improvement Health
1 1.31±0.00 1.31±0.00 0% 63%
2 0.90±0.00 0.90±0.00 -0% 50%
3 5.86±1.18 5.14±0.00 12% 73%
4 3.13±0.40 3.23±0.42 -3% 43%
5 10.26±3.30 9.47±3.34 8% 57%
6 9.67±3.46 10.51±4.32 -9% 44%
7 6.71±2.36 4.89±2.29 27% 71%
8 0.90±0.00 0.90±0.00 -0% 50%
9 4.15±0.00 4.20±0.14 -1% 37%
10 12.84±0.41 12.84±0.41 -0% 50%

Performance

Scene # viamrobotics:main dgottlieb:RSDK-12719 Percent Improvement Probability of Improvement Health
1 0.03±0.01 0.03±0.01 3% 54%
2 0.04±0.01 0.03±0.01 20% 82%
3 0.09±0.03 0.03±0.00 67% 97%
4 1.19±0.04 1.17±0.05 2% 62%
5 1.70±0.38 1.62±0.35 5% 56%
6 1.81±0.57 1.93±0.61 -7% 44%
7 2.18±0.75 2.07±0.75 5% 54%
8 0.03±0.00 0.03±0.00 11% 78%
9 2.04±0.14 1.94±0.07 5% 73%
10 3.36±0.54 3.34±0.56 0% 51%

The above data was generated by running scenes defined in the motion-testing repository
The SHA1 for viamrobotics:main is: b39abd91b2d7c568703e39f08cf352167db9a609
The SHA1 for dgottlieb:RSDK-12719 is: b39abd91b2d7c568703e39f08cf352167db9a609

  • 10 samples were taken for each scene

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test This pull request is marked safe to test from a trusted zone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants