Project Computational Materials Science based on papers in
JACS 2022
DiD 2022
DAMDID 2022
Chemical Science 2021
Chemistry of Materials 2020
- Computational Materials Science in our group applies methods of Periodic Geometry to materials design.
- The subarea called Lattice Geometry studies moduli spaces of simpler periodic lattices in low dimensions 2 and 3.
- The related area of Cloud Isometry Spaces studies geometry of moduli spaces of finite clouds of unlabeled points.
- The wider area of Periodic Geometry studies moduli spaces of general periodic point sets that model all periodic crystals.
- The even wider area of Geometric Data Science studies moduli spaces of any data objects up to practical equivalences.
- The latest developments are discussed in the MIF++ seminar and at the annual conference MACSMIN since 2020.
Invariant-based materials discovery
![]() |
|
- DOI : 10.1021/jacs.2c02653
- Abstract. Mesoporous molecular crystals have potential applications in separation and catalysis, but they are rare and hard to design because many weak interactions compete during crystallization, and most molecules have an energetic preference for close packing. Here, we combine crystal structure prediction (CSP) with structural invariants to continuously qualify the similarity between predicted crystal structures for related molecules. This allows isomorphous substitution strategies, which can be unreliable for molecular crystals, to be augmented by a priori prediction, thus leveraging the power of both approaches. We used this combined approach to discover a rare example of a low-density (0.54 g/cm^3) mesoporous hydrogen-bonded framework (HOF), 3D-CageHOF-1. This structure comprises an organic cage (Cage-3-NH2) that was predicted to form kinetically trapped, low-density polymorphs via CSP. Pointwise distance distribution structural invariants revealed five predicted forms of Cage-3-NH2 that are analogous to experimentally realized porous crystals of a chemically different but geometrically similar molecule, T2. More broadly, this approach overcomes the difficulties in comparing predicted molecular crystals with varying lattice parameters, thus allowing for the systematic comparison of energy–structure landscapes for chemically dissimilar molecules.
@article{zhu2022analogy, title={Analogy Powered by Prediction and Structural Invariants: Computationally-Led Discovery of a Mesoporous Hydrogen-Bonded Organic Cage Crystal}, author={Qiang Zhu and Jay Johal, Daniel Widdowson and Zhongfu Pang and Boyu Li and Christopher Kane and Vitaliy Kurlin and Graeme Day and Marc Little and Andrew Cooper}, journal={Journal of the American Chemical Society}, volume={144}, pages={9893–9901}, year={2022} }
Back to Top of this page | Back to Research & papers | Back to Home page
Co-crystals in the Cambridge Structural Database
![]() |
|
- DOI : 10.1039/D2DD00068G
- Abstract. In this paper we introduce Molecular Set Transformer, a Pytorch-based deep learning architecture designed for solving the molecular pair scoring task whilst tackling the class imbalance problem observed on datasets extracted from databases reporting only successful synthetic attempts. Our models are being trained on all the existing molecular pairs that form cocrystals and are deposited in the Cambridge Structural Database (CSD). Given any new molecular combination, the primary objective of the tool is to be able to select the most effective way to represent the pair and then assign a score coupled with an uncertainty estimation. Molecular Set Transformer is an attention-based framework which learns the important interactions in the various molecular combinations by trying to reconstruct its input by minimizing its bidirectional loss. Several methods to represent the input were tested, both fixed and learnt, with the Graph Neural Network (GNN) and the Extended-Connectivity Fingerprints (ECFP4) molecular representations to perform best showing an overall accuracy higher than 75% on previously unseen data. The trustworthiness of the models is enhanced by adding uncertainty estimates which aims to help chemists prioritize at the early materials design stage both the pairs with high scores and low uncertainty and pairs with low scores and high uncertainty. Our results indicate that the method can achieve comparable or better performance on specific APIs for which the accuracy of other computational chemistry and machine learning tools is reported in the literature. To help visualize and get further insights of all the co-crystals deposited in CSD, we developed an interactive browser-based explorer. An online Graphical User Interface has also been designed for enabling the wider use of our models for rapid in-silico co-crystal screening reporting the scores and uncertainty of any user given molecular pair.
@article{vriza2022molecular, title={Molecular Set Transformer: Attending to the co-crystals in the Cambridge Structural Database}, author={Aikaterini Vriza and Ioana Sovago and Daniel Widdowson and Peter Wood and Vitaliy Kurlin and Matthew Dyer}, journal={Digital Discovery}, volume={1}, pages={834-850}, year={2022} }
Back to Top of this page | Back to Research & papers | Back to Home page
Lattice energy predictions by continuous isometry invariants
![]() |
|
- DOI : 10.1007/978-3-031-12285-9_11
- Abstract. Crystal Structure Prediction (CSP) aims to discover solid crystalline materials by optimizing periodic arrangements of atoms, ions or molecules. CSP takes weeks of supercomputer time because of slow energy minimizations for millions of simulated crystals. The lattice energy is a key physical property, which determines thermodynamic stability of a crystal but has no simple analytic expression. Past machine learning approaches to predict the lattice energy used slow crystal descriptors depending on manually chosen parameters. The new area of Periodic Geometry offers much faster isometry invariants that are also continuous under perturbations of atoms. Our experiments on simulated crystals confirm that a small distance between the new invariants guarantees a small difference of energies. We compare several kernel methods for invariant-based predictions of energy and achieve the mean absolute error of less than 5kJ/mole or 0.05eV/atom on a dataset of 5679 crystals.
@inproceedings{ropers2022fast, title={Fast predictions of lattice energies by continuous isometry invariants of crystal structures}, author={Ropers, Jakob and Mosca, Marco M and Anosova, Olga D and Kurlin, Vitaliy A and Cooper, Andrew I}, booktitle={International Conference on Data Analytics and Management in Data Intensive Domains}, pages={178-192}, year={2022} }
Back to Top of this page | Back to Research & papers | Back to Home page
One class classification for co-crystal discovery
![]() |
|
- Abstract. The implementation of machine learning models has brought major changes in the decision-making process for materials design. One matter of concern for the data-driven approaches is the lack of negative data from unsuccessful synthetic attempts, which might generate inherently imbalanced datasets. We propose the application of the one-class classification methodology as an effective tool for tackling these limitations on the materials design problems. This is a concept of learning based only on a well-defined class without counter examples. An extensive study on the different one-class classification algorithms is performed until the most appropriate workflow is identified for guiding the discovery of emerging materials belonging to a relatively small class, that being the weakly bound polyaromatic hydrocarbon co-crystals. The two-step approach presented in this study first trains the model using all the known molecular combinations that form this class of co-crystals extracted from the Cambridge Structural Database (1722 molecular combinations), followed by scoring possible yet unknown pairs from the ZINC15 database (21736 possible molecular combinations). Focusing on the highest-ranking pairs predicted to have higher probability of forming co-crystals, materials discovery can be accelerated by reducing the vast molecular space and directing the synthetic efforts of chemists. Further on, using interpretability techniques a more detailed understanding of the molecular properties causing co-crystallization is sought after. The applicability of the current methodology is demonstrated with the discovery of two novel co-crystals, namely pyrene-6H-benzo[c]chromen-6-one (1) and pyrene-9,10-dicyanoanthracene (2).
@article{vriza2021one, title={One class classification as a practical approach for accelerating pi-pi co-crystal discovery}, author={Vriza, A and Canaj, A and Vismara, R and Cook, L and Manning, T and Gaultois, M and Wood, P and Kurlin, V and Berry, N and Dyer, M and Rosseinsky, M}, journal={Chemical Science}, volume={12}, number={5}, pages={1702--1719}, year={2021} }
Back to Top of this page | Back to Research & papers | Back to Home page
The Earth Mover’s Distance for Inorganic Compositions
![]() ![]() |
|
- Abstract. It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established, we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the earth mover’s distance (EMD) for inorganic compositions, a welldefined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the inorganic crystal structure database. The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of Machine Learning (ML) techniques. We have found that with no supervision, the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML.
@article{hargreaves2020earth, title={The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions}, author={Hargreaves, Cameron J and Dyer, Matthew S and Gaultois, Michael W and Kurlin, Vitaliy A and Rosseinsky, Matthew J}, journal={Chemistry of Materials}, volume={32}, issue={24}, pages={10610-10620}, year={2020} }
Back to Top of this page | Back to Research & papers | Back to Home page