The concurrency module provides a set of concurrently scalable data structures and patterns for Python. This module is designed to support high-performance, scalable programming with Free Threaded Python.
A concurrently accessible dictionary.
__init__(scaling=17)
: Initializes a new ConcurrentDict with the specified number of concurrent structures. This relates to the number of threads it supports with good scaling. For optimal performance, this value should be close to the number of cores on the machine. However, under or over estimating this value by a factor of 2 or even more does not have a huge impact on performance.as_dict()
: Creates a dict from the key value pairs in this ConcurrentDict. This is not thread consistent; it is safe to call whilst the ConcurrentDict is being updated, however, which key/value pairs will be copied over is not defined.
ConcurrentDict also supports the following access methods:
d[key]
: Returns the value associated with the specified key.d[key] = value
: Sets the value associated with the specified key.del d[key]
: Deletes the key-value pair associated with the specified key.key in d
: ReturnsTrue
if the dictionary holds the specified key,False
otherwise..
ConcurrentDict does not support all the API methods of a built-in dict. It is designed for basic key-value store operations in a concurrent environment.
from ft_utils.concurrency import ConcurrentDict
d = ConcurrentDict()
d['key'] = 'value'
print(d['key']) # prints 'value'
print('key' in d)) # prints True
del d['key']
print('key' in d)) # prints False
A 64-bit integer that can be updated atomically.
__init__(value=0)
: Initializes a new AtomicInt64 with the specified value.get()
: Returns the current value.set(value)
: Sets the value.incr()
: Increments the value and returns the new value.decr()
: Decrements the value and returns the new value.
In addition the following numeric methods are implemented.
__sub__(self, other)
: Returns the result of subtractingother
fromself
.__mul__(self, other)
: Returns the result of multiplyingself
byother
.__floordiv__(self, other)
: Returns the largest whole number less than or equal to the result of dividingself
byother
.__or__(self, other)
: Returns the bitwise OR ofself
andother
.__xor__(self, other)
: Returns the bitwise XOR ofself
andother
.__and__(self, other)
: Returns the bitwise AND ofself
andother
.__neg__(self)
: Returns the negative ofself
.__pos__(self)
: Returnsself
unchanged.__abs__(self)
: Returns the absolute value ofself
.__invert__(self)
: Returns the bitwise NOT ofself
.__bool__(self)
: ReturnsTrue
ifself
is non-zero,False
otherwise.__iadd__(self, other)
: Addsother
toself
in-place and returnsself
.__isub__(self, other)
: Subtractsother
fromself
in-place and returnsself
.__imul__(self, other)
: Multipliesself
byother
in-place and returnsself
.__ifloordiv__(self, other)
: Dividesself
byother
in-place using floor division and returnsself
.__ior__(self, other)
: Performs a bitwise OR ofself
andother
in-place and returnsself
.__ixor__(self, other)
: Performs a bitwise XOR ofself
andother
in-place and returnsself
.__iand__(self, other)
: Performs a bitwise AND ofself
andother
in-place and returnsself
.__int__(self)
: Returns an integer representation ofself
.__eq__(self, other)
: ReturnsTrue
ifself
is equal toother
,False
otherwise.__ne__(self, other)
: ReturnsTrue
ifself
is not equal toother
,False
otherwise.__lt__(self, other)
: ReturnsTrue
ifself
is less thanother
,False
otherwise.__le__(self, other)
: ReturnsTrue
ifself
is less than or equal toother
,False
otherwise.__gt__(self, other)
: ReturnsTrue
ifself
is greater thanother
,False
otherwise.__ge__(self, other)
: ReturnsTrue
ifself
is greater than or equal toother
,False
otherwise.
from ft_utils.concurrency import AtomicInt64
i = AtomicInt64(10)
print(i.get()) # prints 10
i.incr()
print(i.get()) # prints 11
i.add(5)
print(i.get()) # prints 16
A reference that can be updated atomically.
__init__(obj=None)
: Initializes a new AtomicReference with the specified object.get()
: Returns the current object.set(obj)
: Sets the object.exchange(obj)
: Exchanges the current object with the specified object and returns the previous object.compare_exchange(expected, obj)
: Compares the current object with the expected object and exchanges it with the specified object if they match. ReturnsTrue
if the exchange happened,False
otherwise.
The compare_exchange
method can be used in a loop to atomically update the reference, similar to using the CAS instruction in native programming.
from ft_utils.concurrency import AtomicReference
r = AtomicReference(0)
def increment(r):
while True:
current = r.get()
new_value = current + 1
if r.compare_exchange(current, new_value):
break
increment(r)
print(r.get()) # prints 1
In this example, the increment
function uses a loop to atomically increment the value of the AtomicReference. The compare_exchange
method is used to check if the current value is still the same as the expected value, and if so, updates the value to the new value. If another thread has updated the value in the meantime, the compare_exchange
method will return False
and the loop will retry.
Here are the documents for the new classes:
A boolean flag that can be updated atomically.
__init__(value)
: Initializes a new AtomicFlag with the specified value.set(value)
: Sets the value of the flag.__bool__()
: Returns the current value of the flag.
from ft_utils.concurrency import AtomicFlag
flag = AtomicFlag(True)
print(flag) # prints True
flag.set(False)
print(flag) # prints False
A concurrent iterator that gathers values from multiple threads and yields them in order.
__init__(scaling)
: Initializes a new ConcurrentGatheringIterator with the specified scaling factor.insert(key, value)
: Inserts a key-value pair into the iterator.iterator(max_key, clear)
: Returns an iterator that reads and deletes key-value pairs from the iterator in order.
- The iterator uses a ConcurrentDict to store the key-value pairs.
- The
insert
method is thread-safe and can be called from multiple threads. - The
iterator
method returns an iterator that yields the values in order, blocking if the next value is not available. - max_key passed to iterator tells the iterator at what point to stop iteration; i.e. all expected values have been gathered..
- If an exception occurs during insertion, the iterator will fail with a RuntimeError.
- scaling passed to the init function governs the number of threads the iterator supports with good scaling.
from ft_utils.concurrency import ConcurrentGatheringIterator
iterator = ConcurrentGatheringIterator()
iterator.insert(0, 'value0')
iterator.insert(1, 'value1')
iterator.insert(2, 'value2')
for value in iterator.iterator(2):
print(value) # prints 'value0', 'value1', 'value2'
A more complex example:
from ft_utils.concurrency import ConcurrentGatheringIterator, AtomicInt64
from concurrent.futures import ThreadPoolExecutor
def insert_value(iterator, atomic_index, value):
index = atomic_index.incr()
iterator.insert(index, value)
def test_concurrent_gathering_iterator():
iterator = ConcurrentGatheringIterator()
atomic_index = AtomicInt64(-1)
with ThreadPoolExecutor(max_workers=10) as executor:
futures = []
for i in range(100):
futures.append(executor.submit(insert_value, iterator, atomic_index, i))
for future in futures:
future.result()
results = list(iterator.iterator(99))
assert results == list(range(100))
test_concurrent_gathering_iterator()
In this example, we use a ThreadPoolExecutor to insert values into the ConcurrentGatheringIterator from multiple threads. We use an AtomicInt64 to generate the indices in order. After inserting all the values, we retrieve the results from the iterator and check that they are in the correct order.
Note that the insert_value
function is a helper function that inserts a value into the iterator at the next available index. The test_concurrent_gathering_iterator
function is the main test function that creates the iterator, inserts values from multiple threads, and checks the results.
This example demonstrates that the ConcurrentGatheringIterator can handle concurrent inserts from multiple threads and still produce the correct results in order.
A concurrent queue that allows multiple threads to push and pop values.
__init__(scaling=None, lock_free=False)
: Initializes a new ConcurrentQueue with the specified scaling factor. Iflock_free
is True, the queue will use a lock-free implementation, which can improve performance in certain scenarios.push(value)
: Pushes a value onto the queue. This method is thread-safe and can be called from multiple threads.pop(timeout=None)
: Pops a value from the queue. The method will block until a value is available. Iftimeout
is specified, the method will raise an Empty exception if no value is available within the specified time.pop_local(timeout=None)
: Returns a LocalWrapper object containing the popped value. The behavior is otherwise identical topop(timeout)
.shutdown(immediate=False)
: Initiates shutdown of the queue. Ifimmediate
is True, the queue will shut down immediately, otherwise it will wait for any pending operations to complete.size()
: Returns the number of elements currently in the queue.empty()
: Returns True if the queue is empty, False otherwise.
ShutDown
raised to indicate the ConcurrentQueue is shutdown. In Python 3.13 and abovequeue.ShutDown
is a type and this Exception will be an aliase for it. In earlier versions of Python concurrent defines its own ShutDown type to allow backward compatibility.- 'Empty' raised to indicate a pop/get operation timed out. This is the same as queue.Empty.
- The queue uses a ConcurrentDict to store the values.
- If an exception occurs during push, the queue will fail with a RuntimeError.
- The
scaling
parameter passed to the__init__
function governs the number of threads the queue supports with good scaling. - Setting
lock_free
to True can improve performance in scenarios with a large number of readers and writers, as it avoids overloading the kernel with too many locks. However, this comes at the cost of increased CPU usage.
The lock-free implementation of the queue uses a combination of atomic operations and careful synchronization to ensure thread safety without the need for locks. This approach can provide better performance and scalability in certain scenarios, particularly those with a large number of readers and writers. It will tend to consume more CPU in lightly loaded contitions than using the lock based approach.
In general, the lock-free implementation is recommended for scenarios where:
- There are a large number of readers and writers.
- The queue is very large and needs to support a high throughput.
- Low latency is critical.
- Profiling shows poor scaling hand high load on the kernel.
On the other hand, the lock-based implementation is recommended for scenarios where:
- There are a small number of readers and writers.
- The queue is relatively small and does not require high throughput.
from ft_utils.concurrency import ConcurrentQueue
queue = ConcurrentQueue()
queue.push('value0')
queue.push('value1')
queue.push('value2')
queue.push('value3')
queue.push('value4')
print(queue.pop()) # prints 'value0'
print(queue.pop(timeout=0.1))
print(queue.pop_local()) # returns a LocalWrapper object
queue.shutdown()
queue.pop()
# Raises ShutDown
queue.pop()
# Raises ShutDown
queue.push('value5')
This follows the same API as queue.Queue. For simple applications StdConcurrentQueue will work as a drop in replacement for queue.Queue. However, there are subtle differences:
- StdConcurrentQueue will use a very small amount of CPU time even when not processing elements.
- This implementation has weaker FIFO guaratees than queue.Queue, which might cause subtle issues in some applications.
- StdConcurrentQueue will use and release memory in a different pattern than queue.Queue.
- The maxsize is not as strictly guaranteed. If maxsize is set and a large number of threads attempt to fill the queue beyond maxsize then a small overfill might occur due to the lack of a lock to prevent this race condition.
Therefore, in complex applications it may be a better approach to mindfully replace highly contended queue.Queue instances with StdConcurrentQueue. In this case it is also better to use the simpler ConcurrentQueue where possible.