Project Nearest Neighbors within a new area of Geometric Data Science based on papers
ICML 2023
TopoInVis 2022
- Nearest Neighbors of points in a metric space are used for computing isometry invariants of discrete point sets.
- The wider area of Cloud Isometry Spaces develops continuous parametrisations for finite clouds of unlabeled points.
- The even wider area of Geometric Data Science studies moduli spaces of any data objects up to practical equivalences.
- The applied area of Computational Materials Science explores practical applications of geometric invariants and metrics.
- The latest developments are discussed in the MIF++ seminar and at the annual conference MACSMIN since 2020.
Neighbor search by a new Compressed Cover Tree
|
- Abstract. Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q∈Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree on R and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done in O(n log m) time with a hidden dimensionality factor. This paper fills a substantial gap in the past proofs of time complexity by defining a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all k-nearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the given sets R,Q but not on their sizes.
@inproceedings{elkin2023new, title={A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree}, author={Elkin, Yury and Kurlin, Vitaliy}, booktitle={International Conference on Machine Learning (ICML)}, pages={9267-9311}, year={2023} }
Back to Top of this page | Back to Research & papers | Back to Home page
Gaps in the time complexity for cover trees
|
- DOI : 10.1109/TopoInVis57755.2022.00008
- Abstract. This paper is motivated by the k-nearest neighbors search: given an arbitrary metric space, and its finite subsets (a reference set R and a query set Q), design a fast algorithm to find all k-nearest neighbors in R for every point q in Q. In 2006, Beygelzimer, Kakade, and Langford introduced cover trees to justify a near-linear time complexity for the neighbor search in the sizes of Q,R. Section 5.3 of Curtin's PhD (2015) pointed out that the proof of this result was wrong. The key step in the original proof attempted to show that the number of iterations can be estimated by multiplying the length of the longest root-to-leaf path in a cover tree by a constant factor. However, this estimate can miss many potential nodes in several branches of a cover tree, that should be considered during the neighbor search. The same argument was unfortunately repeated in several subsequent papers using cover trees from 2006. We explicitly constructed challenging datasets that provide counterexamples to the past proofs of time complexity for the cover tree construction, the k-nearest neighbor search presented at ICML 2006, and the dual-tree search algorithm published in NIPS 2009. The corrected near-linear time complexities with extra parameters are proved in another forthcoming paper by using a new compressed cover tree simplifying the original tree structure.
@inproceedings{elkin2022counterexamples, title={Counterexamples expose gaps in the proof of time complexity for cover trees introduced in 2006}, author={Elkin, Yury and Kurlin, Vitaliy}, booktitle={Topological Data Analysis and Visualization (TopoInVis)}, pages={9-17}, year={2022} }
Back to Top of this page | Back to Research & papers | Back to Home page