-
conbine
represents union for a set or a map and concatenation for a sequence. -
combine
must be efficient i.e. O(log n + log m) (n,m being the size of the 2 input collections)
cannot be efficiently concatenated.
Sets have have efficient lookup, insertion and deletion. The running time depends on the implementation:
- Hash tables, expected O(1)
- Balanced trees: O(log n)
- Linked lists: O(n)
However, most implementations do not have efficient union operation.
Complexities depend on the implementation:
- Mutable linked lists: O(1) append and prepend (and concatenaton), O(n) insertion
- functional (cons) lists: O(1) prepend, everything else (concatenaton) O(n)
- Array Lists: amortized O(1) append, O(1) random acces, everything else (concatenaton) O(n)
To avoid these unefficient methods, most data structures can be constructed using two-phase construction.
This technique relies on an intermediate data structures that:
- has an efficient ( O(log n + logm) or better)
combine
- has an efficient
+=
- can be converted to the resulting data structures in O(n/P)
The two phases are implemented in result
and they are:
- partition the indices into subintervals
- initialize array in parallel
class ArrayCombiner[T <: AnyRef: ClassTag](val parallelism: Int) {
private var numElems = 0
private val buffers = new ArrayBuffer[ArrayBuffer[T]]
buffers += new ArrayBuffer[T]
// O(1) amortized, low constat factors ~ arraybuffer
def +=(x: T) {
buffers.last += x
numElems += 1
this
}
//O(P) if buffers contains less than O(P) nested arraybuffers
def combine(that: ArrayCombiner[T]) = {
buffers ++= that.buffers
numElems += that.numElems
this
}
def result: Array[T] = {
val array = new Array[T](numElems)
val step = math.max(1, numElems / parallelism)
val starts = (0 until numElems by step) :+ numElems
val chunks = starts.zip(starts.tail)
val tasks = for ((from, end) <- chunks) yield task {
copyTo(array, from, end)
}
task.foreach(_.join())
array
}
}
- Partition the hash codes into buckets
- allocate the table, map hash codes from different buckets into different regions
- Partition the elements into non-overlapping intervals according to their ordering
- Construct search trees in parallel and link non-overlapping trees.
- Spatially partition the elements
- Construct non non-overlapping subsets and link them
Two phase construction is one of the three main strategies:
- Two-phase construction: The combiner uses an intermediate data structure with an efficient
combine
method to partition the elements. Thenresult
is called to construct the final data structure in parallel from the intermediate data structure. - An efficient concatenation or union operation (algorithm): the preferred way when the resulting data structure allows it
- Concurrent data structure: different combiners share the same underlying data structure and rely on synchronization to correctly update the data structure when
+=
is called.