forked from deterministic-arts/DartsPyLRU
-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.txt
More file actions
359 lines (279 loc) · 9.87 KB
/
README.txt
File metadata and controls
359 lines (279 loc) · 9.87 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
LRU Dictionaries
=================
>>> from darts.lib.utils.lru import LRUDict
An `LRUDict` is basically a simple dictionary, which has a defined
maximum capacity, that may be supplied at construction time, or modified
at run-time via the `capacity` property::
>>> cache = LRUDict(1)
>>> cache.capacity
1
The minimum capacity value is 1, and LRU dicts will complain, if someone
attempts to use a value smaller than that::
>>> cache.capacity = -1 #doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: -1 is not a valid capacity
>>> LRUDict(-1) #doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: -1 is not a valid capacity
LRU dictionaries can never contain more elements than their capacity value
indicates, so::
>>> cache[1] = "First"
>>> cache[2] = "Second"
>>> len(cache)
1
In order to ensure this behaviour, the dictionary will evict entries if
it needs to make room for new ones. So::
>>> 1 in cache
False
>>> 2 in cache
True
The capacity can be adjusted at run-time. Growing the capacity does not
affect the number of elements present in an LRU dictionary::
>>> cache.capacity = 3
>>> len(cache)
1
>>> cache[1] = "First"
>>> cache[3] = "Third"
>>> len(cache)
3
but shrinking does::
>>> cache.capacity = 2
>>> len(cache)
2
>>> sorted(list(cache.iterkeys()))
[1, 3]
Note, that the entry with key `2` was evicted, because it was the oldest
entry at the time of the modification of `capacity`. The new oldest entry
is the one with key `1`, which can be seen, when we try to add another
entry to the dict::
>>> cache[4] = "Fourth"
>>> sorted(list(cache.iterkeys()))
[3, 4]
The following operations affect an entry's priority::
- `get`
- `__getitem__`
- `__setitem__`
- `__contains__`
Calling any of these operations on an existing key will boost the key's
priority, making it more unlikely to get evicted, when the dictionary needs
to make room for new entries. There is a special `peek` operation, which
returns the current value associated to a key without boosting the priority
of the entry::
>>> cache.peek(3)
'Third'
>>> cache[5] = "Fifth"
>>> sorted(list(cache.iterkeys()))
[4, 5]
As you can see, even though we accessed the entry with key `3` as the last
one, the entry is now gone, because it did not get a priority boost from
the call to `peek`.
The class `LRUDict` supports a subset of the standard Python `dict`
interface. In particular, we can iterate over the key, values, and
items of an LRU dict::
>>> sorted([k for k in cache.iterkeys()])
[4, 5]
>>> sorted([v for v in cache.itervalues()])
['Fifth', 'Fourth']
>>> sorted([p for p in cache.iteritems()])
[(4, 'Fourth'), (5, 'Fifth')]
>>> sorted(list(cache))
[4, 5]
Note, that there is no guaranteed order; in particular, the elements are
not generated in priority order or somesuch. Similar to regular `dict`s,
an LRU dict's `__iter__` is actually any alias for `iterkeys`.
Furthermore, we can remove all elements from the dict:
>>> cache.clear()
>>> sorted(list(cache.iterkeys()))
[]
Thread-safety
--------------
Instances of class `LRUDict` are not thread safe. Worse: even concurrent
read-only access is not thread-safe and has to be synchronized by the
client application.
There is, however, the class `SynchronizedLRUDict`, which exposes the
same interface as plain `LRUDict`, but fully thread-safe. The following
session contains exactly the steps, we already tried with a plain `LRUDict`,
but now using the synchronized version::
>>> from darts.lib.utils.lru import SynchronizedLRUDict
>>> cache = SynchronizedLRUDict(1)
>>> cache.capacity
1
>>> cache.capacity = -1 #doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: -1 is not a valid capacity
>>> LRUDict(-1) #doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: -1 is not a valid capacity
>>> cache[1] = "First"
>>> cache[2] = "Second"
>>> len(cache)
1
>>> 1 in cache
False
>>> 2 in cache
True
>>> cache.capacity = 3
>>> len(cache)
1
>>> cache[1] = "First"
>>> cache[3] = "Third"
>>> len(cache)
3
>>> cache.capacity = 2
>>> len(cache)
2
>>> sorted(list(cache.iterkeys()))
[1, 3]
>>> cache[4] = "Fourth"
>>> sorted(list(cache.iterkeys()))
[3, 4]
>>> cache.peek(3)
'Third'
>>> cache[5] = "Fifth"
>>> sorted(list(cache.iterkeys()))
[4, 5]
>>> sorted([k for k in cache.iterkeys()])
[4, 5]
>>> sorted([v for v in cache.itervalues()])
['Fifth', 'Fourth']
>>> sorted([p for p in cache.iteritems()])
[(4, 'Fourth'), (5, 'Fifth')]
>>> sorted(list(cache))
[4, 5]
>>> cache.clear()
>>> sorted(list(cache.iterkeys()))
[]
Auto-loading Caches
====================
Having some kind of dictionary which is capable of cleaning itself
up is nice, but in order to implement caching, there is still something
missing: the mechanism, which actually loads something into our dict.
This part of the story is implemented by the `AutoLRUCache`::
>>> from darts.lib.utils.lru import AutoLRUCache
Let's first define a load function::
>>> def load_resource(key):
... if key < 10:
... print "Loading %r" % (key,)
... return "R(%s)" % (key,)
and a cache::
>>> cache = AutoLRUCache(load_resource, capacity=3)
>>> cache.load(1)
Loading 1
'R(1)'
>>> cache.load(1)
'R(1)'
As you can see, the first time, an actual element is loaded, the load
function provided to the constructor is called, in order to provide the
actual resource value. On subsequent calls to `load`, the cached value
is returned.
Internally, the `AutoLRUCache` class uses an `LRUDict` to cache values,
so::
>>> cache.load(2)
Loading 2
'R(2)'
>>> cache.load(3)
Loading 3
'R(3)'
>>> cache.load(4)
Loading 4
'R(4)'
>>> cache.load(1)
Loading 1
'R(1)'
Note the "Loading 1" line in the last example. The cache has been initialized
with a capacity of 3, so the value of key `1` had to be evicted when the one
for key `4` was loaded. When we tried to obtain `1` again, the cache had to
properly reload it, calling the loader function.
If there is actually no resource for a given key value, the loader function
must return `None`. It follows, that `None` is never a valid resource value
to be associated with some key in an `AutoLRUCache`.
>>> cache.load(11, 'Oops')
'Oops'
Thread-safety
--------------
Instances of class `AutoLRUCache` are fully thread safe. Be warned, though,
that the loader function is called outside of any synchronization scope the
class may internally use, and has to provide its own synchronization if
required.
The cache class actually tries to minimize the number of invocations of the
loader by making sure, that no two concurrent threads will try to load the
same key value (though any number of concurrent threads might be busy loading
the resources associated with different keys).
Caching and stale entries
==========================
There is another `AutoLRUCache`-like class provided by the LRU module,
which gives more control over timing out of entries than `AutoLRUCache` does.
>>> from darts.lib.utils.lru import DecayingLRUCache
>>> current_time = 0
>>> def tick():
... global current_time
... current_time += 1
Here, we defined a simple "clock". We could have used the system clock,
but roling our own here gives us more control over the notion of "time".
Now, let's define a simple cache entry:
>>> from collections import namedtuple
>>> Entry = namedtuple("Entry", "timestamp payload")
>>> def make_entry(payload):
... return Entry(current_time, payload)
and a loader function
>>> def load(full_key):
... print "Loading", full_key
... return make_entry(u"Entry for %r" % (full_key,))
For the following parts, we consider an entry to be "too old", if it has
been created more then two "ticks" ago:
>>> def is_still_current(entry):
... return current_time - entry.timestamp <= 2
Finally, we create another cache thingy
>>> cache = DecayingLRUCache(load, tester=is_still_current, capacity=3)
The `DecayingLRUCache` shows much of the same behaviour of the `AutoLRUCache`,
namely:
>>> cache.load(1)
Loading 1
Entry(timestamp=0, payload=u'Entry for 1')
>>> cache.load(2)
Loading 2
Entry(timestamp=0, payload=u'Entry for 2')
>>> cache.load(3)
Loading 3
Entry(timestamp=0, payload=u'Entry for 3')
>>> cache.load(4)
Loading 4
Entry(timestamp=0, payload=u'Entry for 4')
>>> cache.load(1)
Loading 1
Entry(timestamp=0, payload=u'Entry for 1')
The entry with key `1` had to be reloaded, since the cache has a capacity of
3, and the old entry for `1` was evicted when the entry for `4` was loaded
and we needed to make room.
>>> cache.load(3)
Entry(timestamp=0, payload=u'Entry for 3')
Now, let's advance time
>>> tick()
>>> cache.load(3)
Entry(timestamp=0, payload=u'Entry for 3')
The entry is still available.
>>> tick()
>>> cache.load(3)
Entry(timestamp=0, payload=u'Entry for 3')
>>> tick()
>>> cache.load(3)
Loading 3
Entry(timestamp=3, payload=u'Entry for 3')
Note, that eviction is still based on LRU, not on the age test.
Change Log
==========
Version 0.5
------------
Added a "from __future__ import with_statement" for Python 2.5 compatibility.
Note, that supporting py2.5 is not a real goal, and I did not test the code
using that version.
Version 0.4
------------
Added class `DecayingLRUCache`
Version 0.3
------------
Added class `SynchronizedLRUDict` as thread-safe counterpart for `LRUDict`.