This post motivates the new research area of Topological Computer Vision.
- Why don’t we have self-driving cars yet?
- Stability under noise is still a big problem
- First key results of Topological Computer Vision
Why don’t we have self-driving cars yet?
Here is the response by Dr. Andreas Wendel (Google) from his invited talk Self-Driving Cars at the CVPR 2015 workshop Computer Vision in Vehicle Technology: “We can’t predict all possible road accidents. In the weirdest case our car stopped and waited for an old lady in a wheelchair chasing a duck with a broomstick … in the middle of a road!”
A Google car makes about 200 decisions per second. If any of these 200 decisions is wrong, there could be a fatal accident. Self-driving cars will appear on the market when the error rate is less than 0.01%. My collaborator Andrew Fitzgibbon from Microsoft Research Cambridge has predicted that we might wait for another 10 years.
Stability under noise is still a problem
The current flagship method in speech and image recognition is Deep Learning. Briefly, an algorithm is trained (often for weeks) to predict correct outputs from big labelled data. For instance, the ImageNet database has more than 14M images split into over 21K categories like cars, frogs etc. These images were manually labelled by humans, which required about 25K Amazon Mechanical Turks.
During the training, the algorithm finds features that best split all labelled images into required categories. During validation, the algorithm chooses the category whose features are closest to those of a new given image. The overall error rate, when the algorithm mis-classifies images, is about 6.7%, see page 20 in ImageNet Large Scale Visual Recognition Challenge (arXiv/1409.0575, 8M). However, exercises below how this approach fails in the presence of little noise.
First key results of Topological Computer Vision
Topological Computer Vision was introduced as a new research area within Topological Data Analysis (TDA) in the invited talk at the scoping workshop of the Alan Turing Institute at Oxford on 10th September 2015.
The first key concept is a Homologically Persistent Skeleton (HoPeS) depending only on a point cloud C without extra input parameters. HoPeS(C) is the first structure that provides a closed geometric approximation to an unknown graph given only by a noisy sample C. Details are in
- the slides Topological Computer Vision (pdf, 11M) and in the paper
- A one-dimensional Homologically Persistent Skeleton of an unstructured point cloud in any metric space (extended pdf, 14 pages, 3.2M), Computer Graphics Forum, v. 34, no. 5 (2015), p. 253-262.
The big aim is to combine the stable-under-noise persistence from TDA with the state-of-the-art tools of Deep Learning that currently suffer from noise.
Exercises on analyzing noisy images by your (!) deep learning
- Q1. The middle image below is the difference (multiplied by 10) between 2 dog images. One image is correctly recognized by the state-of-the-art Deep Learning Net as a dog. However, another image with little added noise is misclassified as an … ostritch. Where is the noisy image: on the left or on the right?
- Q2. This is an example from CVPR 2015, the top conference in Computer Vision and Pattern Recognition). Can you guess how the image below is mis-classified?
(Hint for possible answers: kid’s drawing, pedestrian, school bus, trademark).
You could write a brief answer or feedback in your comment: reply.