Computer-aided Materials Science project based on papers in
Chemical Science 2021
Chemistry of Materials 2020
One class classification for co-crystal discovery
|
@article{vriza2021one, title={One class classification as a practical approach for accelerating π-π co-crystal discovery}, author={Vriza, A and Canaj, A and Vismara, R and Cook, L and Manning, T and Gaultois, M and Wood, P and Kurlin, V and Berry, N and Dyer, M and Rosseinsky, M}, journal={Chemical Science}, year={2021} }
- Abstract. The implementation of machine learning models has brought major changes in the decision-making process for materials design. One matter of concern for the data-driven approaches is the lack of negative data from unsuccessful synthetic attempts, which might generate inherently imbalanced datasets. We propose the application of the one-class classification methodology as an effective tool for tackling these limitations on the materials design problems. This is a concept of learning based only on a well-defined class without counter examples. An extensive study on the different one-class classification algorithms is performed until the most appropriate workflow is identified for guiding the discovery of emerging materials belonging to a relatively small class, that being the weakly bound polyaromatic hydrocarbon co-crystals. The two-step approach presented in this study first trains the model using all the known molecular combinations that form this class of co-crystals extracted from the Cambridge Structural Database (1722 molecular combinations), followed by scoring possible yet unknown pairs from the ZINC15 database (21736 possible molecular combinations). Focusing on the highest-ranking pairs predicted to have higher probability of forming co-crystals, materials discovery can be accelerated by reducing the vast molecular space and directing the synthetic efforts of chemists. Further on, using interpretability techniques a more detailed understanding of the molecular properties causing co-crystallization is sought after. The applicability of the current methodology is demonstrated with the discovery of two novel co-crystals, namely pyrene-6H-benzo[c]chromen-6-one (1) and pyrene-9,10-dicyanoanthracene (2).
Back to Top of this page | Back to Research & papers | Back to Home page
The Earth Mover’s Distance for Inorganic Compositions
|
@article{hargreaves2020earth, title={The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions}, author={Hargreaves, Cameron J and Dyer, Matthew S and Gaultois, Michael W and Kurlin, Vitaliy A and Rosseinsky, Matthew J}, journal={Chemistry of Materials}, year={2020} }
- Abstract. It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established, we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the earth mover’s distance (EMD) for inorganic compositions, a welldefined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the inorganic crystal structure database. The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of Machine Learning (ML) techniques. We have found that with no supervision, the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML.
Back to Top of this page | Back to Research & papers | Back to Home page