Project Nearest Neighbors within a new area of Geometric Data Science based on papers
ICML 2023
TopoInVis 2022
 Nearest Neighbors of points in a metric space are used for computing isometry invariants of discrete point sets.
 The wider area of Cloud Isometry Spaces develops continuous parametrisations for finite clouds of unlabeled points.
 The even wider area of Geometric Data Science studies moduli spaces of any data objects up to practical equivalences.
 The applied area of Computational Materials Science explores practical applications of geometric invariants and metrics.
 The latest developments are discussed in the MIF++ seminar and at the annual conference MACSMIN since 2020.
Neighbor search by a new Compressed Cover Tree

 Abstract. Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding knearest neighbors of every point q∈Q in the set R in a nearlinear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree on R and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done in O(n log m) time with a hidden dimensionality factor. This paper fills a substantial gap in the past proofs of time complexity by defining a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all knearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the given sets R,Q but not on their sizes.
@inproceedings{elkin2023new, title={A new nearlinear time algorithm for knearest neighbor search using a compressed cover tree}, author={Elkin, Yury and Kurlin, Vitaliy}, booktitle={International Conference on Machine Learning (ICML)}, pages={92679311}, year={2023} }
Back to Top of this page  Back to Research & papers  Back to Home page
Gaps in the time complexity for cover trees

 DOI : 10.1109/TopoInVis57755.2022.00008
 Abstract. This paper is motivated by the knearest neighbors search: given an arbitrary metric space, and its finite subsets (a reference set R and a query set Q), design a fast algorithm to find all knearest neighbors in R for every point q in Q. In 2006, Beygelzimer, Kakade, and Langford introduced cover trees to justify a nearlinear time complexity for the neighbor search in the sizes of Q,R. Section 5.3 of Curtin's PhD (2015) pointed out that the proof of this result was wrong. The key step in the original proof attempted to show that the number of iterations can be estimated by multiplying the length of the longest roottoleaf path in a cover tree by a constant factor. However, this estimate can miss many potential nodes in several branches of a cover tree, that should be considered during the neighbor search. The same argument was unfortunately repeated in several subsequent papers using cover trees from 2006. We explicitly constructed challenging datasets that provide counterexamples to the past proofs of time complexity for the cover tree construction, the knearest neighbor search presented at ICML 2006, and the dualtree search algorithm published in NIPS 2009. The corrected nearlinear time complexities with extra parameters are proved in another forthcoming paper by using a new compressed cover tree simplifying the original tree structure.
@inproceedings{elkin2022counterexamples, title={Counterexamples expose gaps in the proof of time complexity for cover trees introduced in 2006}, author={Elkin, Yury and Kurlin, Vitaliy}, booktitle={Topological Data Analysis and Visualization (TopoInVis)}, pages={917}, year={2022} }
Back to Top of this page  Back to Research & papers  Back to Home page