Deep painting: cheminformatics approaches to connect the biological profiles of compounds with their structural information

Loading...
Thumbnail Image

Date issued

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Reuse License

Description of rights: CC-BY-SA-4.0
Item type: Item , DissertationAccess status: Open Access ,

Abstract

In this doctoral thesis, different research projects were carried out, with the underlying theme of connecting the biological profiles of the compounds - determined by the morphological profiling Cell Painting Assay (CPA), with their structural information using different cheminformatics methods. In the first project, Machine Learning (ML) models were developed to perform Quantitative Structure-Activity Relationship (QSAR) to predict the Cell Painting (CP) profiles of the compounds using different molecular representations. Different combinations of modeling methods and representations were explored, however, none of them had good predictive power. An hypothesis of models’ low predictive power is that compounds with similar structures do not show similar bioprofiles. In the second project, a generative modeling method was developed to generate numerous compounds having similar CP profiles. The prepared generative method did not generate compounds with relevant medicinal chemistry structures. Lysosomotropism is a phenomenon where small, lipophilic compounds with basic moieties get trapped in the lysosomes. CPA is a reliable surrogate to identify compounds showing lysosomotropism because such compounds have a distinct phenotype. In the third project, it was reported that compounds having suitable lipophilicty and basicity do not qualify them to be lysosomotropic. To study which factors drive lysosomotropism and vice-versa, Matched Molecular Pair Analysis (MMPA) and Explainable Machine Learning (XML) was performed on suitable subset of the internal CPA data. Modern ML methods like Neural Network (NN) and ensemble decision trees generally have high predictive power but suffer from “black box” nature as their predictions can not be explained. In the fourth project, an open-source software called eXplainable FingerPrints (X-FP) was developed. X-FP uses ML feature attribution methods like Shapley Additive Explanations (SHAP) in context of Morgan Fingerprints (MF), and generates an intuitive report highlighting important MF bits and the substructures encoded by them.

Description

Keywords

Citation

Relationships

Endorsement

Review

Supplemented By

Referenced By