Sandipan Dey explains what a kd-tree is and how it works:

The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of

andrange search. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its subtree. The root corresponds to the entire plane [(−∞, −∞), (+∞, +∞ )]; the left and right children of the root correspond to the two rectangles split by thenearest-neighbor searchx-coordinate of the point at the root; and so forth.

To find all points contained in a given query rectangle, start at the root and recursively search for points in**Range search**:*both*subtrees using the following*pruning rule*: if the query rectangle does not intersect the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a subtree only if it might contain a point contained in the query rectangle.To find a closest point to a given query point, start at the root and recursively search in**Nearest-neighbor search**:*both*subtrees using the following*pruning rule*: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a node only if it might contain a point that is closer than the best one found so far. The effectiveness of the pruning rule depends on quickly finding a nearby point. To do this, organize the recursive method so that when there are two possible subtrees to go down, you choose first*the subtree that is on the same side of the splitting line as the query point*; the closest point found while exploring the first subtree may enable pruning of the second subtree.**k-nearest neighbors search**: This method returns the*k*points that are closest to the query point (in any order); return all*n*points in the data structure if*n*≤*k*. It must do this in an efficient manner, i.e. using the technique from kd-tree nearest neighbor search, not from brute force.

Sandipan implements a fairly classic problem in this space: the behavior of a group of flocking birds.

Kevin Feasel

2017-09-13

Data Science, Machine Learning