Skip to content

giannis_2_28_2017

Ioannis Paraskevakos edited this page Mar 1, 2017 · 1 revision
  1. k-dimensional trees create a binary search tree of hyperplanes.It partitions the k-dimensional space by selecting a different axis at every node. Axis selection is a modulo operation between the tree's depth and number of dimensions. The median value from the elected axis is used to perform a split.

    • Construction: O(knlogn) -> nlogn to presort as every dimension and for every dimension.
    • Insert: O(logn)
    • Remove: O(logn)
    • Query: (m + n^(1-1/k))
  2. A ball tree is a variant off-d trees. Their difference lies in the way data points is partitioned. Instead of hypercubes, ball trees use hypersphere.

In both cases, the tree can be queried to report the neighbors of a points based on a radius. When all points are queried an adjacency list is reported, which can be used to create a graph.

  1. Until now I have not been able to find a way to use 3D meshes to find the leaflets.

  2. I/O: Most papers that I found try to exploit as much as possible the local hard drives of the nodes. Klein et al. [1] at a letter in Oxford Bioinformatics, propose a new file format, called SFile. This file is used through HDFS to increase the read rate. Based on the letter, using this filetype they were able to sustain an 6 ~ 8GB/s when 128 HDDs were used.

[1]Max Klein, Rati Sharma, Chris H. Bohrer, Cameron M. Avelis, Elijah Roberts; Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark. Bioinformatics 2017; 33 (2): 303-305. doi: 10.1093/bioinformatics/btw614

Clone this wiki locally