Skip to content
Li Wang edited this page Aug 29, 2014 · 25 revisions

CLAIMS is a parallel in-memory database prototype, which runs on clusters of commodity servers and provides fast data analysis on relational dataset.

##Highlights

  1. Massively parallel execution engine. CLAIMS relies on highly parallel query processing engine to dramatically accelerate data analysis. Query evaluations are distributed to the clusters and executed in parallel. Furthermore, query evaluations in each node are in a multi-threaded fashion to leverage the computation power of multi-core hardware.
  2. Smart intra-node parallelism. Pipelining the query execution among the clusters could effectively improve the query respond time but its benefits will be discounted if the workloads among execution fragments are imbalanced due to the improper intra-node parallelism. To tackle this problem, a novel elastic pipelining is proposed in CLAIMS to automatically adapt the intra-node parallelism of each query according to the runtime workload. Thanks to elastic pipelining, execution fragments which are detected to the performance bottleneck of the whole query will be given more parallelism to accelerate the data processing, while the execution fragments which are detected to be over-producing will decrease the parallelism to avoid unnecessary computation allocation.
  3. Efficient in-memory data processing. CLAIMS employs a large set of optimization techniques to achieve efficient in-memory data processing, including batch-at-a-time processing, cache-sensitive operator, SIMD-based optimization, code generation, lock-free and concurrent structure. Those optimizations enable CLAIMS to process gigabytes data per second in a single thread.
  4. Network communication optimization. Distributed query processing inevitably involves in network communication, which is usually the performance bottleneck in parallel in-memory databases because of the relatively slow network bandwidth compared with the efficient in-memory data processing throughput. When compiling user sql into execution plan, CLAIMS query optimizer considers a large set of candidate query plans and output one with minimized network data transmission cost. Furthermore, CLAIMS is equipped with an efficient data exchange implementation, which offers efficient, scalable and skew-resilient data communication among CLAIMS instances. Those optimizations greatly improve the response time of the queries which involved in a large amount of data communication.
Clone this wiki locally