This post answers the following questions about topological data analysis:
- What does the word data mean?
- What are the practical aims?
- What is a typical problem?
Input data in topological data analysis
The usual data input in topological data analysis is a noisy sample of points in a Euclidean or in a more general metric space. For example, a black-and-white image can be given as a finite sample of black points in the plane.
More generally, data can be sampled from any topological shape (or a space). Examples of shapes below are a graph, a figure-eight shape in the plane, a 2-dimensional torus.
Practical aims of topological data analysis
The ultimate goal is to understand the meaning of data. The practical aims are the following:
- Represent data in an easy way for further processing. Why?
We need to find and easily encode extra structures to work later
with structured mathematical objects rather than with raw data.
- Quantify or measure given data by topological invariants. Why?
Extra structures allow us to find topological invariants that
depend only on the shape of data, not on extra structures.
- Make robust statistical predictions about topology of data. Why?
If data are huge, any algorithm inevitably works on subsamples and
we should be able to take (say) the average of all topological outputs.
We give links to more details about each of the 3 practical aims above:
- our post on representing data by simplicial complexes
- the wonderful post Measuring shape by Prof Gunnar Carlsson
- the paper Stochastic Convergence of Persistence Landscapes and Silhouettes by F.Chazal, B.Fasy, F.Lecci, A.Rinaldo, L.Wasserman.
Easy example of a typical hard problem
Our trained human eye can recognize a familiar heart shape in the cloud of red points below. The red heart shape is easy enough and we could connect each point with its two nearest neighbors to get a reasonable contour.
However, a robust contour detection in noisy images is still a hot problem in computer vision. Topological data analysis looks for methods beyond the simplest nearest neighbor search. The key idea is not to fix a scale parameter when searching for neighbors, but analyze a summary of data over all scales so that this summary is stable under noise.
In conclusion, we highlight answers to the questions posed at the beginning of this post:
- the input data are finite clouds of points without much structure
- the practical aims are to represent and measure data in easy ways
- a typical problem is to reconstruct contours from a noisy sample.
Exercises on the introduction to topological data analysis
- Q1. What topological shape can one reconstruct from the cloud of points given
at the beginning of the post? Hint. You have seen a partial yellow shape above.
- Q2. What can you see in the black-and-white image below? Hint. Find an animal.
You could write a brief solution or feedback in your comment: reply.