Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak in packetparser's interfaceLockMap #1236

Closed
nddq opened this issue Jan 17, 2025 · 0 comments · Fixed by #1249
Closed

Potential memory leak in packetparser's interfaceLockMap #1236

nddq opened this issue Jan 17, 2025 · 0 comments · Fixed by #1249
Assignees
Labels
area/plugins priority/1 P1 type/bug Something isn't working
Milestone

Comments

@nddq
Copy link
Contributor

nddq commented Jan 17, 2025

Currently, packetparser generates an entry in the interfaceLockMap for each new interface that comes up. However, it fails to remove these entries when the interfaces go down. In environments with high pod counts and frequent churn, this can cause the map to grow indefinitely, resulting in a memory leak.

        ifaceKey := ifaceToKey(iface)
	lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})
	mu := lockMapVal.(*sync.Mutex)
	mu.Lock()
	defer mu.Unlock()

	switch event.Type {
	case endpoint.EndpointCreated:
		p.l.Debug("Endpoint created", zap.String("name", iface.Name))
		p.createQdiscAndAttach(iface, Veth)
	case endpoint.EndpointDeleted:
		p.l.Debug("Endpoint deleted", zap.String("name", iface.Name))
		// Clean.
		if value, ok := p.tcMap.Load(ifaceKey); ok {
			v := value.(*tcValue)
			p.clean(v.tc, v.qdisc)
			// Delete from map.
			p.tcMap.Delete(ifaceKey)
		}
	default:
		// Unknown.
		p.l.Debug("Unknown event", zap.String("type", event.Type.String()))
	}
@nddq nddq added area/plugins priority/1 P1 type/bug Something isn't working labels Jan 17, 2025
@byte-msft byte-msft linked a pull request Jan 21, 2025 that will close this issue
7 tasks
@ibezrukavyi ibezrukavyi added this to the 1.0 milestone Jan 22, 2025
github-merge-queue bot pushed a commit that referenced this issue Feb 5, 2025
# Description

The packetParser was creating entries in interfaceLockMap for each new
interface
but failing to remove them when interfaces were deleted. In environments
with
high pod counts and frequent churn, this caused a memory leak as the map
grew
indefinitely.

## Related Issue

[Potential memory leak in packetparser's interfaceLockMap
#1236](#1236)

## Checklist

- [X] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [X] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [X] I have correctly attributed the author(s) of the code.
- [X] I have tested the changes locally.
- [X] I have followed the project's style guidelines.
- [ ] I have updated the documentation, if necessary.
- [X] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

### Solution
- Added cleanup of interfaceLockMap entries in the EndpointDeleted case
- Improved mutex handling logic to prevent resource leaks
- Updated test cases to verify proper cleanup of both tcMap and
interfaceLockMap

### Testing
- Added comprehensive test coverage for interface deletion scenario
- Verified cleanup of both maps in test cases
- Tested with high pod churn scenarios

### Impact
This fix prevents memory leaks in environments with frequent pod
creation/deletion,
improving the overall stability and resource usage of the system.
---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.

---------

Signed-off-by: Yerlan Baiturinov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/plugins priority/1 P1 type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants