-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC support #138
base: main
Are you sure you want to change the base?
GC support #138
Conversation
1. Added Gc class to sync.py 2. Mounted to ExpirableDictStorage and PersistentDictStorage 3. dict interface changed
@@ -463,6 +463,7 @@ def dict( | |||
storage_class = fsync.PersistentDictStorage | |||
else: | |||
storage_class = fsync.ExpirableDictStorage | |||
storage_class.maxsize = maxsize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks like it changes the maxsize
value of the storage class. It will cause side effect for other calls with different value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed
if storage_class is None:
means that we're in a 'default' mode so we're in the somewhat private zone... what about _maxsize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is not about the naming rule. The next code will make problem.
# maxsize of f is 512 here because fsync.PersistentDictStorage.maxsize is 512
@ring.dict(maxsize=512)
def f():
...
# maxsize of both f and g are 128 now becasue fsync.PersistentDictStorage is 128
@ring.dict(maxsize=128)
def g():
...
We shouldn't set any kind of class variable for decorator parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally missed that. What a shame. I'll figure out a better way.
ring/func/sync.py
Outdated
def __init__(self, backend, target_size, expire_f=None): | ||
assert round(target_size) > 0, 'target_size has to be at least 1' | ||
assert expire_f is None or callable(expire_f), 'expire_f has to be function or None' | ||
assert isinstance(backend, type({})), 'backend has to be dict-like' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abc.MutableMapping
will be better than type({})
. https://docs.python.org/3/library/collections.abc.html#collections.abc.MutableMapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
ring/func/sync.py
Outdated
|
||
self._mutex.acquire() | ||
if (len(self._backend) > self._target_size): | ||
WorkThread(self).start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we take any advantage by using threads? I think gc for dict is cpu-bouded job and there is no benefits by threading. Tell me if I missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. The cache
itself is in-memory so won't be any heavy I/O issue. However, If the dict
entries are big enough, like 1000000, set_value
(or in some other methods) can be blocked quite an amount of time. At least we can context switch if we go multithreading.
This is just my opinion and have no issues removing the threading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it. If the size matters, I think there are 2 reasons to regard threading is overkill.
- If the backend requires IO job, it makes sense. But for dict, even 1000000 iteration is a microsecond problem. In most of cases with reasonable functions, it will be less than 1000000.
- If we correctly add locking to it, gc blocks any kind of adding and removing from the dict - which mean the caller still should wait for gc job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
ring/func/sync.py
Outdated
|
||
class ExpirableDictStorage(fbase.CommonMixinStorage, fbase.StorageMixin): | ||
|
||
maxsize = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value should not be bound to the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I can't 100% follow your thoughts with maxsize
.
My guess is you're worrying that this value can be easily accessed & modified and my understanding is ExpirableDictStorage
& PersistentDictStorage
is our implement for default calling so we can control this value.
ring/func/sync.py
Outdated
@@ -229,6 +301,11 @@ def set_value(self, key, value, expire): | |||
expired_time = _now + expire | |||
self.backend[key] = expired_time, value | |||
|
|||
if self.maxsize < len(self.backend): | |||
if self._gc == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may check this part after solving threading design of gc
Thanks for the contribution!
The CI failure is about formatting. You can check the result from https://travis-ci.org/youknowone/ring/builds/546264007?utm_source=github_status&utm_medium=notification |
Thanks again for your kind & careful advice.
CI Failure was a fail of my own test. I'll correct it at next PR. |
Trimmer can be the name too. I don't know the proper name for this kind of job. So if you have any suggestion or find a correct name for it, please use your suggestion to the code. GC was just an expression in issues to make me sure not to be confused later. |
@youknowone |
I'm working on |
@@ -638,7 +639,7 @@ class RingRope(RopeCore): | |||
def __init__(self, *args, **kwargs): | |||
super(RingRope, self).__init__(*args, **kwargs) | |||
self.user_interface = self.user_interface_class(self) | |||
self.storage = self.storage_class(self, storage_backend) | |||
self.storage = self.storage_class(self, storage_backend, maxsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does redis_hash failure related to this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, __init__
is overridden in RedisHashStorage
. I'm looking at right now and seems there are some workarounds to get back to work but not sure which one will be the best one.
The problem is that BaseStorage
has a init and ExpirableDictStorage
& PersistentDictStorage
adheres it but RedisHashStorage
overrides it.
Can you tell me your opinion?
@youknowone |
Sorry, i was busy for last week. i will check your recent change soon |
Gc
class to sync.pyExpirableDictStorage
andPersistentDictStorage
dict
interface changedSolves #115 issue.
DEBUG
variable in Gc class is for debug log purpose. If you know the better way, please, let me know.PriorityQueue
to remove expired ones first. Build time would be betweenO(n)
orO(n*log(n))
so it won't be that harmful.