- What is an Atmospheric River?
- Why are Atmospheric Rivers important?
- How are Atmospheric Rivers detected?

An *Atmospheric River* (AR) is a narrow filament of concentrated water vapour in the atmosphere, usually up to several thousand kilometers long and a few hundred kilometers wide.

These filaments were called Atmospheric Rivers in the paper “Atmospheric rivers and bombs” (pdf) in 1994, because a single filament can carry more water than the Amazon River. Hence an Atmospheric River can be informally considered as a “river” flowing in the atmosphere.

The picture above shows the integrated water vapour (IWV) measured in grams over a squared millimetre, formally the mass of water in the vertical column over a square 1×1 mm. Higher values of IWV correspond to the red colour, lower values are shown by the blue colour.

The red box from the picture above is zoomed in the picture below showing how an Atmospheric River hits the California coast in the US.

At any given moment there are 3-5 Atmospheric Rivers on the planet and all of them contribute over 90% to the global north-south water vapour transport. When an Atmospheric River hits a coast, this “river” flows down to the land as heavy rain, which causes severe floods.

These extreme weather events regularly happen along the West Coast of North America, Western Europe and the west coast of North Africa, e.g. read “Rivers in the Sky Are Flooding The World With Tropical Waters” (pdf).

The paper “Winter floods in Britain are connected to atmospheric rivers” (pdf) justifies that all winter floods in the UK in 2000-2010 were caused by Atmospheric Rivers including the 19th November 2009 severe flood on the River Eden in Cumbria (UK).

The input for a detection is a scalar field of the Integrated Water Vapour (IWV) over a regular grid whose lines are usually parallel to meridians and longitudes. The input can be visualised as a matrix of IWV values that are obtained from weather observations or computer simulations. So every node in the regular grid has an associated value of the Integrated Water Vapour and is connected to the four neighbours (north, west, south, east) in the grid.

High moisture regions that bring water vapour from mid-latitudes in the ocean up to the land in the north are called *Atmospheric Rivers *to distinguish them from other high moisture regions that don’t cause floods. A detection algorithm should identify only Atmospheric Rivers.

The picture above shows a big hole in the yellow-red region that doesn’t form an elongated filament. The picture below contains the yellow high moisture region without holes, but this filament doesn’t reach the coast. Hence there are no Atmospheric Rivers in both cases.

The traditional approach to detect an Atmospheric River is to fix a threshold of the Integrated Water Vapour, say 20 g/mm^{2}, and consider all nodes with values above this threshold. If these nodes form a connected component in the regular grid that has expected geometric parameters (length and width) and also joins the mid-latitude region (the bottom line of the chosen box) with the California coast, the latest detection algorithm in the TECA software (Toolkit for Extreme Climate Analysis, pdf) says that an Atmospheric River is detected.

The state-of-the-art algorithms work only for carefully chosen parameter values. Many Climate Scientists propose different values. That is why we are now working on a parameterless approach combining ideas of Topological Data Analysis with Machine Learning.

Which of the pictures below show Atmospheric Rivers in your opinion and why?

You could write a brief answer or feedback in your comment: reply.

]]>This post motivates the new research area of *Topological Computer Vision*.

- Why don’t we have self-driving cars yet?
- Stability under noise is still a big problem
- First key results of
*Topological Computer Vision*

Here is the response by Dr. Andreas Wendel (Google) from his invited talk *Self-Driving Cars* at the CVPR 2015 workshop Computer Vision in Vehicle Technology: “We can’t predict all possible road accidents. In the weirdest case our car stopped and waited for an old lady in a wheelchair chasing a duck with a broomstick … in the middle of a road!”

A Google car makes about 200 decisions per second. If any of these 200 decisions is wrong, there could be a fatal accident. Self-driving cars will appear on the market when the error rate is less than 0.01%. My collaborator Andrew Fitzgibbon from Microsoft Research Cambridge has predicted that we might wait for another 10 years.

The current flagship method in speech and image recognition is *Deep Learning*. Briefly, an algorithm is trained (often for weeks) to predict correct outputs from big labelled data. For instance, the ImageNet database has more than 14M images split into over 21K categories like cars, frogs etc. These images were manually labelled by humans, which required about 25K Amazon Mechanical Turks.

During the training, the algorithm finds features that best split all labelled images into required categories. During validation, the algorithm chooses the category whose features are closest to those of a new given image. The overall error rate, when the algorithm mis-classifies images, is about 6.7%, see page 20 in ImageNet Large Scale Visual Recognition Challenge (arXiv/1409.0575, 8M). However, exercises below how this approach fails in the presence of little noise.

*Topological Computer Vision* was introduced as a *new research area* within Topological Data Analysis (TDA) in the invited talk at the scoping workshop of the Alan Turing Institute at Oxford on 10th September 2015.

The first key concept is a *Homologically Persistent Skeleton* (HoPeS) depending only on a point cloud C without extra input parameters. HoPeS(C) is the first structure that provides a closed geometric approximation to an unknown graph given only by a noisy sample C. Details are in

- the slides Topological Computer Vision (pdf, 11M) and in the paper
- A one-dimensional Homologically Persistent Skeleton of an unstructured point cloud in any metric space (extended pdf, 14 pages, 3.2M), Computer Graphics Forum, v. 34, no. 5 (2015), p. 253-262.

The big aim is to combine the stable-under-noise persistence from TDA with the state-of-the-art tools of Deep Learning that currently suffer from noise.

**Q1**. The middle image below is the difference (multiplied by 10) between 2 dog images. One image is correctly recognized by the state-of-the-art Deep Learning Net as a dog. However, another image with little added noise is misclassified as an … ostritch. Where is the noisy image: on the left or on the right?**Q2**. This is an example from CVPR 2015, the top conference in Computer Vision and Pattern Recognition). Can you guess how the image below is mis-classified?

(Hint for possible answers: kid’s drawing, pedestrian, school bus, trademark).

You could write a brief answer or feedback in your comment: reply.

]]>- What does the word
*data*mean? - What are the practical aims?
- What is a typical problem?

The usual data input in topological data analysis is a noisy sample of points in a Euclidean or in a more general metric space. For example, a black-and-white image can be given as a finite sample of black points in the plane.

More generally, data can be sampled from any topological shape (or a space). Examples of shapes below are a graph, a figure-eight shape in the plane, a 2-dimensional torus.

The ultimate goal is to understand the meaning of data. The practical aims are the following:

- Represent data in an easy way for further processing. Why?

We need to find and easily encode extra structures to work later

with structured mathematical objects rather than with raw data. - Quantify or measure given data by topological invariants. Why?

Extra structures allow us to find topological invariants that

depend only on the shape of data, not on extra structures. - Make robust statistical predictions about topology of data. Why?

If data are huge, any algorithm inevitably works on subsamples and

we should be able to take (say) the average of all topological outputs.

We give links to more details about each of the 3 practical aims above:

- our post on representing data by simplicial complexes
- the wonderful post Measuring shape by Prof Gunnar Carlsson
- the paper Stochastic Convergence of Persistence Landscapes and Silhouettes by F.Chazal, B.Fasy, F.Lecci, A.Rinaldo, L.Wasserman.

Our trained human eye can recognize a familiar heart shape in the cloud of red points below. The red heart shape is easy enough and we could connect each point with its two nearest neighbors to get a reasonable contour.

However, a robust contour detection in noisy images is still a hot problem in computer vision. Topological data analysis looks for methods beyond the simplest nearest neighbor search. The key idea is not to fix a scale parameter when searching for neighbors, but analyze a summary of data over all scales so that this summary is stable under noise.

**In conclusion**, we highlight answers to the questions posed at the beginning of this post:

- the input data are finite clouds of points without much structure
- the practical aims are to represent and measure data in easy ways
- a typical problem is to reconstruct contours from a noisy sample.

**Q1**. What topological shape can one reconstruct from the cloud of points given

at the beginning of the post?**Hint**. You have seen a partial yellow shape above.**Q2**. What can you see in the black-and-white image below?**Hint**. Find an animal.

You could write a brief solution or feedback in your comment: reply.

]]>**what**are simplicial complexes and what are they not?**how**can a shape be represented by a simplicial complex?**why**are simplicial complexes easy for representing shapes?

A *simplicial complex* is a high-dimensional generalization of a graph. That is why simplicial complexes are sometimes called *hypergraphs*. The building blocks of a simplicial complex are vertices, edges, triangles and higher-dimensional *simplices* like tetrahedra in \(\mathbb{R}^3\).

The standard k-dimensional simplex is the subset of points in the \(k\)-dimensional space:

\(\Delta^k=\{(x_1,\dots,x_k)\in\mathbb{R}^k \; :\; 0\leq x_i\leq 1,\; \sum\limits_{i=1}^k x_i=1\}\).

We may say that the simplex \(\Delta^k\) is *spanned by* its \(k+1\) vertices marked by (say) \(0,1,\dots,k\). Any subset of \(l\) vertices spans an \(l\)-dimensional *face *(or a subsimplex) of \(\Delta^k\).

Many geometrically different shapes are homeomorphic (topologically equivalent) to the same simplex \(\Delta^k\subset\mathbb{R}^k\). For instance, the 2-dimensional disk \(\{(x,y)\in\mathbb{R}^2\; :\; x^2+y^2\leq 1\}\) is homeomorphic to the standard triangle \(\Delta^2\). Then the combinatorial structure of \(\Delta^2\) induces a similar structure on the disk, namely 3 vertices, 3 edges and one 2-dimensional face.

A *triangulation* of (or the structure of a *simplicial complex* on) a shape is

- a splitting into finitely many pieces homeomorphic to simplices in such a way that
- the intersection of any two simplices in the triangulation is their common face.

A shape is called *triangulable* if it has a triangulation satisfying the above conditions. A simplicial complex may contain simplices of different dimensions as shown in the sun glasses below. The dimension of a simplicial complex is the maximum dimension of its simplices.

Representing a shape as a simplicial complex allows us to introduce later topological invariants of shapes. For instance, the Euler characteristic of a shape is expressed via the number of \(l\)-dimensional simplices in a trianglation for different values of \(l\).

The condition on intersection of simplices in dimension 1 means that any two edges can meet only at a single common vertex. Hence a 1-dimensional simplicial complex can not have loops or double edges with the same endpoints. This condition seems too restrictive for graphs, but is essential for higher dimensions as explained below.

Any simplicial complex with vertices marked by (say) \(0,1,\dots,n\) can be encoded by a list of maximum simplices that are not contained in larger simplices. Any maximum simplex spanned by \((k+1)\) vertices \(i_0,\dots,i_k\) is encoded by the unordered \((k+1)\)-tuple \((i_0,\dots,i_k)\). The sun glasses with 8 vertices above are encoded by the list (012),(345),(05),(14),(26),(37).

To encode a shape, first we should triangulate it. Let us split a round ring into 4 triangles with vertices 0,1,2,3 below. The corresponding list of triples is (012),(013),(023),(123).We may notice that the same list of triples encodes the standard tetrahedron \(\Delta^3\) with four 2-dimensional faces. To avoid this confusion, the definition of a simplicial complex requires that any two simplices should meet along their common face. So the splitting into 4 curved triangles above is not a simplicial complex.

If our code contains triples (123) and (013), we should know that the corresponding triangles share the common edge (13) as in the tetrahedron \(\Delta^3\), not only the endpoints 1 and 3 as in the round ring. A minimum triangulation of a round ring is at the beginning of this post.

**In conclusion**, we highlight answers to the questions posed at the beginning of the post:

- a simplicial complexes is glued from standard simplices along their common faces,
- to represent a shape, we triangulate it into pieces homeomorphic to simplices,
- simplicial complexes can be easily encoded in a computer memory, hence

they provide a convenient representation of a shape for further analysis.

**Q1**. Encode the figure-eight shape below by a simplicial complex.

**Hint***.*Each of 3 boundary contours should have at least 3 vertices.**Q2**. Find a minimum triangulation of the 2-dimensional torus below.

**Hint**A minimum triangulation of the torus has at least 7 vertices.*.*

You could write a brief solution or feedback in your comment: reply.

]]>