-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LRUCache.__copy__ is unsound #3852
Comments
fwiw the original trace was generated via this diff: $ diff -u {venvnew,.venv}/lib/python3.12/site-packages/sentry_sdk/_lru_cache.py
--- venvnew/lib/python3.12/site-packages/sentry_sdk/_lru_cache.py 2024-12-04 17:17:44
+++ .venv/lib/python3.12/site-packages/sentry_sdk/_lru_cache.py 2024-12-04 17:16:18
@@ -62,6 +62,7 @@
"""
+from uuid import uuid4
from copy import copy
SENTINEL = object()
@@ -74,8 +75,16 @@
VALUE = 3
+
class LRUCache:
- def __init__(self, max_size):
+ def _log(self, m):
+ with open('lru_cache.log', 'a+') as f:
+ f.write(f'{self._u}: {m}\n')
+
+ def __init__(self, max_size, _log=True):
+ self._u = uuid4()
+ if _log:
+ self._log('__init__')
assert max_size > 0
self.max_size = max_size
@@ -92,13 +101,15 @@
self.hits = self.misses = 0
def __copy__(self):
- cache = LRUCache(self.max_size)
+ cache = LRUCache(self.max_size, _log=False)
+ self._log(f'__copy__ -> {cache._u}')
cache.full = self.full
cache.cache = copy(self.cache)
cache.root = copy(self.root)
return cache
def set(self, key, value):
+ self._log(f'set {key} {value}')
link = self.cache.get(key, SENTINEL)
if link is not SENTINEL:
@@ -129,7 +140,11 @@
self.root[KEY] = self.root[VALUE] = None
- del self.cache[old_key]
+ try:
+ del self.cache[old_key]
+ except KeyError:
+ self._log('crash!!!')
+ raise
self.cache[key] = old_root
@@ -141,6 +156,7 @@
self.full = len(self.cache) >= self.max_size
def get(self, key, default=None):
+ self._log(f'get {key}')
link = self.cache.get(key, SENTINEL)
if link is SENTINEL: and then running the first 504 tests of shard 7 in the getsentry/sentry testsuite |
assuming python3.6+ only you may be able to utilize builtin _SENTINEL = object()
class LRUCache:
def __init__(self, max_size: int):
if max_size <= 0:
raise AssertionError(f'invalid max_size: {max_size}')
self.max_size = max_size
self._data = {}
self.hits = self.misses = 0
self.full = False
def __copy__(self):
new = LRUCache(max_size=self.max_size)
new.hits = self.hits
new.misses = self.misses
new.full = self.full
new._data = self._data.copy()
return new
def set(self, key, value):
current = self._data.pop(key, _SENTINEL)
if current is not _SENTINEL:
self._data[key] = value
elif self.full:
self._data.pop(next(iter(self._data)))
self._data[key] = value
else:
self._data[key] = value
self.full = len(self._data) >= self.max_size
def get(self, key, default=None):
try:
ret = self._data.pop(key)
except KeyError:
self.misses += 1
ret = default
else:
self.hits += 1
self._data[key] = ret
return ret
def get_all(self):
return list(self._data.items()) (this passes the current set of tests -- though I'd argue that the |
This is related to this: #3862 I will make a patch release with the other fix because this feature is going open beta today. Then we can see the suggested solution above to maybe implement it as the long term fix of lru cache. |
Thanks for finding this @asottile-sentry. Sorry I didn't see it sooner. It was lost in my GitHub notifications. |
Fixed by: #3861 |
I think there's probably still another bug of a similar flavor since |
yeah in fact the reproduction in this issue still happens |
@asottile-sentry Ah, sorry I was on a work trip last week and didn't see this. I've removed all the bogus copy dunder methods and I'm just deepcopying the buffer on scope fork. Really can't remember why I used the dunder methods in the first place. Thanks for re-opening and testing this. I've included your reproduction steps as part of the test coverage so this should hopefully be resolved. Of course let me know if you find anything else and I will do my best to fix it. In regards to your new LRU proposal I think it looks great. I'm in favor of the change. But I'm not the maintainer of this repo and, considering my track record on this bug, I'm probably not the right person for the job. |
I don't think deepcopying the cache/root is the way to go here, since we don't really want to copy the keys/values for example. Copying is also what caused #3862 because the root attribute and root object that is referenced by the circular list is no longer the same object after copying. Instead I propose to just walk the list and create new root / cache structures along the way, see my PR. This also makes the previously introduced cap on iterations inside The issue inside the sentry CI also seems to go away: |
Hey @ffelixg thanks for that PR. I have added some tests and also changed the |
@asottile-sentry and @cmanallen the fix from @ffelixg looks sound to me. As this is tricky code, I would love to have your reviews as well, please have a look at this: #3883 |
why are we so resistant to using a much simpler and likely much more performant lru implementation |
@asottile-sentry PRs welcome. @antonpirker @ffelixg Thank you Felix for the contribution! I've left some review comments. tl;dr I think its the wrong approach but the changes I suggested are small. I have a bias to my PR but I believe I was objective in my review. |
@asottile-sentry "PRs welcome" might come across as snarky 🙈 my bad. I support your approach. If you decide to open a PR I'll review it for you! |
@asottile-sentry I also think your implementation is better. I just found the issue in the original implementation interesting after watching your youtube video. 🙂 I'm not sure if this lru_cache is used externally or just used internally. If it's just internal and internal usage is fine with a deep instead of a shallow copy then my PR is a bit pointless, since deepcopy seems to work out of the box. |
Mostly because it is a comment and not a PR 😅 |
But you are right, your solution is way simpler. I have plugged it into @ffelixg PR and will merge it. Thanks everyone for your time. |
A simpler and better LRU Cache implementation that prevents data leaking between copied caches. Fixes #3852 --------- Co-authored-by: Anton Pirker <[email protected]>
- Removes manual overrides of copy behavior and leaves it up to the caller. - E.g. a future use case may require a non-deepcopy. If we override copy they would have to remove the dunder copy, update every implementation which relies copy, before finally creating their own copy implementation. - Deepcopies the flag buffer. - Though we do not cache mutable references yet we may soon and so this foot gun should be removed from possibility. - Removes "copy" test coverage from `test_lru_cache.py`. We're no longer assuming copy usage and leave it up to the caller. - The existing test in `tests/test_scope.py` covers the cache pollution case [originally mentioned here](#3852). - The mutable cache pollution case is not covered because we do not currently cache mutable objects. In general a generic class should assume as few implementation details as possible. If we leave the existing copy method someone may assume copy semantics and rely on it in a way that is inappropriate. Closes: #3886 Co-authored-by: Anton Pirker <[email protected]>
How do you use Sentry?
Sentry Saas (sentry.io)
Version
latest git revision
Steps to Reproduce
unfortunately I haven't completely narrowed down a "minimal" reproduction but I originally found this from reading the implementation and tracing the output -- attached is a script and data which produces a crash due to a particular ordering of copys and sets.
the root cause of the problem is that
__copy__
is incorrect as written -- it shallow copies the internal datastructures which incorrectly pollutes newly created cachesin the ~best case this produces incorrect values poisoning the parent copy:
at worst case it causes crashes (see attached data)
lru_cache.log
cc @cmanallen -- introduced in dd1117d
Expected Result
should not crash, should not have internal data integrity problems
Actual Result
the script eventually crashes with:
The text was updated successfully, but these errors were encountered: