Prof Vitaliy Kurlin: mathematics & computer science

Data Science theory and applications. Everything is possible!

E-mail: vitaliy.kurlin(at)gmail.com, University of Liverpool, UK

PhD training in Data Science

photo of the Computer Science department

Vision of the PhD training in Data Science

Back to Top of this page | Back to Home page

Cohort-based training for PhD students

Back to Top of this page | Back to Home page

Geometric Data Science in Spring 2024 (Fridays between 11-12.30 in Ashton 2.08)

Back to Top of this page | Back to Home page

Fundamentals of Data Science in Autumn 2023

Back to Top of this page | Back to Home page

Advanced topics in Data Science in Spring 2023

Back to Top of this page | Back to Home page

Fundamentals of Data Science in Autumn 2022

Back to Top of this page | Back to Home page

Advanced topics in Data Science in Spring 2022

Back to Top of this page | Back to Home page

Introductory topics in Data Science in Autumn 2021

  • 7 December 2021 : introduction to Bayesian networks and Monte-Carlo methods (Olga Anosova)
    Talk Predicting Influenza A Viral Host Using PSSM and Word Embeddings by PhD student Yanhua Xu
  • 30 November 2021 : from frequentist statistics to the Bayesian approach (Olga Anosova)
    Talk Data Science-based approach to solid crystalline materials by PhD student Daniel Widdowson
  • 23 November 2021 : statistical hypotheses (Matt Bright)
  • 16 November 2021 : probability distributions (Matt Bright)
  • 9 November 2021 : probabilistic paradoxes (Olga Anosova)
  • 3 November 2020 : introduction to probability (Olga Anosova)
  • 27 October 2021 : important structures on mathematical objects (Matt Bright)
    Talk Earth Mover's Distance on chemical compositions by PhD student Cameron Hargreaves
  • 20 October 2021 : thinking like a mathematican (Matt Bright)
  • 13 October 2021 : descriptive statistics (Matt Bright)
  • 6 October 2021 : introduction to Data Science (Vitaliy Kurlin)

Back to Top of this page | Back to Home page

Advanced topics in Data Science in Spring 2021

  • 20 May 2021 : logistic regression as a link between statistics and machine learning (Olga Anosova)
  • 13 May 2021 : introduction to logistic regression (Olga Anosova)
    Talk Average Minimum Distances of periodic point sets by PhD student Marco Mosca
  • 6 May 2021 : three talks by PhD students at the PGR workshop
  • 29 April 2021 : skeletons of point clouds (Vitaliy Kurlin)
  • 22 April 2021 : single-edge clustering (Vitaliy Kurlin)
  • 15 April 2021 : graph classifications (Vitaliy Kurlin)
  • 18 March 2021 : Talk 12-lead ECG Classification Using Time Series Motifs by PhD student Hanadi Aldosari
  • 11 March 2021 : Principal Component Analysis (Matt Bright)
  • 4 March 2021 : how to change a linear basis (Matt Bright)
    Talk Understanding ethnic inequalities in gastrointestinal infection by PhD student Iram Zahair
  • 26 February 2021 : invariants of linear maps (Matt Bright)
  • 18 February 2021 : matrices of linear maps (Matt Bright)
  • 11 February 2021 : correlation and regression (Matt Bright)

Back to Top of this page | Back to Home page

Introductory topics in Data Science in Autumn 2020

  • 17 December 2020 : frequentist vs Bayesian approaches (Olga Anosova)
  • 10 December 2020 : introduction to Bayesian statistics (Olga Anosova)
  • 3 December 2020 : talk Biomarkers-based detection of liver cancer by PhD student Mohamed Elhalwagy
  • 26 November 2020 : Earth Mover's distance (PhD student Cameron Hargreaves)
    Talk Using k-modes clustering to identify different types of cyclists by PhD student Aidan Watmuff
  • 19 November 2020 : equivalences and metrics (Vitaliy Kurlin)
    Talk `Machine learning for mass cytometry data of chronic lymphocytic leukemia by PhD student Muizdeen Raji
  • 12 November 2020 : statistical hypotheses (Olga Anosova)
    Talk Machine learning for influenza A viral host classification by PhD student Yanhua Xu
  • 5 November 2020 : probability distributions (Olga Anosova)
    Talk The Maintenance of Trials Methodology Research Using Machine Learning by PhD student Iqra Muhammad
  • 29 October 2020 : probabilistic paradoxes (Olga Anosova)
    Talk Learning to Prioritise Pathology Data in the Absence of a Ground Truth by PhD student Jing Qi
    Talk Wearable Sensing for Non-invasive Human Pose Estimation during Sleep by PhD student Omar Elnaggar
  • 22 October 2020 : introduction to probability (Vitaliy Kurlin)
  • 15 October 2020 : descriptive statistics (Vitaliy Kurlin)
  • 8 October 2020 : introduction to Data Science (Vitaliy Kurlin)

Back to Top of this page | Back to Home page

Advanced topics in Data Science in Spring 2020

  • 14 May 2020 : skeletons of point clouds (Vitaliy Kurlin)
  • 7 May 2020 : Voronoi diagrams of point clouds (Vitaliy Kurlin)
  • 30 April 2020 : single-edge clustering of point clouds (Vitaliy Kurlin)
  • 26 March 2020 : graph visualisations (Vitaliy Kurlin)
  • 19 March 2020 : graph classifications (Vitaliy Kurlin)
  • 12 March 2020 : graph representations (with a tutorial by Olga Anosova)
  • 27 February 2020 : frequentist vs Bayesian (Olga Anosova),
    Student presentations by Jing Qi and Theofilos Triommatis
  • 20 February 2020 : conditional probabilities (Olga Anosova)
  • 13 February 2020 : AI for Health (Frans Coenen),
    Student presentations by Matthew Carter and Vincent Beraud
  • 6 February 2020 : the Bayes theorem with examples (Vitaliy Kurlin)

Back to Top of this page | Back to Home page

Introductory topics in Data Science in Autumn 2019

  • 10 December 2019 : Principal Component Analysis (Vitaliy Kurlin)
  • 3 December 2019 : how to change a linear basis (Vitaliy Kurlin)
  • 26 November 2019 : invariants of linear maps (Vitaliy Kurlin)
  • 19 November 2019 : matrices of linear maps (Olga Anosova)
  • 12 November 2019 : equivalence relations and vectors (Vitaliy Kurlin)
  • 5 November 2019 : clustering problems and k-means (Vitaliy Kurlin)
  • 29 October 2019 : correlation and regression (Vitaliy Kurlin)
  • 22 October 2019 : statistical hypothese (Vitaliy Kurlin)
  • 15 October 2019 : probability theory (Vitaliy Kurlin)
  • 8 October 2019 : descriptive statistics (Vitaliy Kurlin)

Back to Top of this page | Back to Home page

Cases of PhD students Daniel Widdowson (2020-2024) and Jonathan Balasingham (2021-2025)

Daniel Widdowson's photoDaniel Widdowson has BSc in Mathematics (Warwick) and MSc in Computer Science (Liverpool).

Daniel's MSc thesis supervised by Vitaliy Kurlin in summer 2020 led to the high-profile MATCH paper introducing ultra-fast isometry invariants (Average Minimum Distances) for mapping all periodic crystals, and to the NeurIPS 2022 paper establishing the Crystal Isometry Principle for solid crystalline materials.

Daniel's PhD is supervised since October 2020 by Vitaliy Kurlin, Andy Cooper, and Jason Cole.

Daniel's research in his own words: Crystal Structure Prediction (CSP) is a set of methods for predicting new crystalline materials given a molecule. The way crystals are stored by a computer is ambiguous, i.e., one crystal can be represented in many ways, so during CSP it is not possible to automatically detect and remove duplicates. Currently this is handled manually in a time-consuming filtering process.

Our work uses mathematical tools called isometry invariants to tackle this problem of ambiguity. Every crystal has an invariant which will not change if the crystal is represented differently, and similar crystals have similar invariants to account for atomic vibrations and measurement errors.

As part of the Materials Innovation Factory at the University of Liverpool, co-supervised by Professor Andy Cooper and in collaboration with the Cambridge Crystallographic Data Centre (CCDC), this work has shown impact and promise even outside of applications in crystal structure prediction. The CCDC curates the Cambridge Structural Database (CSD), a collection of over one million crystals collected from research all over the world. Our tools searched the database for duplicates in a process totalling over 200 billion comparisons, leading to 5 pairs of crystals currently being investigated.

invariants help discover materials

These comparisons demonstrated the Crystal Isometry Principle stating that any crystal is determined uniquely by the geometry of its atomic centres. So all crystals live in a common landscape parametrised by invariants, the ‘Crystal Isometry Space’.

The paper in JACS used invariants in a novel way to compare crystals whose molecules were different but superficially alike. The two molecules could form crystals that were similar by eye, but this was difficult to detect automatically. Our tools detected and quantified these similarities, and all given reference crystals had analogues in the other set.

The paper Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives in the top Computer Science venue CVPR extended the (isometry moduli space in the) classification of triangles from school geometry to arbitrary clouds of unlabeled points in any Euclidean space.

Jonathan Balasingham's photoJonathan Balasingham has gained many degrees: MSc in Scientific and Data-Intensive Computing (University College London), MSc in Industrial Engineering (San Jose State University) and BSc in Computer Science (University of California, Santa Cruz).

Jonathan's PhD is supervised since October 2021 by Viktor Zamaraev, Vitaliy Kurlin, Andy Cooper.

Jonathan's research in his own words: Working with molecular crystals presents inherent difficulty due to their periodic nature. Until recently, there have not been rigorous ways to classify and compare crystal structures. Research from the Data Science Theory and Applications group has provided two means by which to accomplish this, Average Minimum Distances and Pointwise Distance Distributions.

These mathematical tools give us the capability to quickly compare large amounts of crystals in a precise way. Because of this, we’ve been able to build tools such as a search engine and visualization software to explore crystal databases such as the Cambridge Structural Database provided by the CCDC.

Work from the research group also granted the expansion from pure mathematics to other domains such as machine learning where having an unambiguous representation for molecular crystals opens doors for use in new algorithms and allows for improvement upon existing methods. More generally, taking on a geometric view of data science applications can help reduce data needs and make for more robust and effective models. The PhD project is successful because

  • It allowed for the development of open-source crystal comparison and visualization software for crystallographers and chemists to use to explore crystals and their structure
  • Resulted in a new approach to the crystal property prediction problem via a transformer based model which can aid in the evaluation of crystal stability and ultimately, its structure
  • Improved upon the performance of existing stability prediction models by providing a more precise encoding for crystal structure and the opportunity to expand such models for application to other aspects of molecular crystals, such as its thermodynamic properties.

Back to Top of this page | Back to Home page

History : Doctoral Network in Artificial Intelligence for Future Digital Health was a doctoral training centre funded by the University of Liverpool in 2019 - 2021 to train the next generation of world-leading experts in Data Science an AI to solve data intensive problems in healthcare.

Leadership team of the network

Back to Top of this page | Back to Home page

PhD projects: first cohort from Autumn 2019 (six students)

Back to Top of this page | Back to Home page

PhD projects: second cohort from Autumn 2020 (five students)

Back to Top of this page | Back to Home page

PhD projects: third cohort from Autumn 2021 (six students)

Back to Top of this page | Back to Home page