Improving small molecules activity modelling capability of cell painting data through data augmentation and effective representation learning

Ha, Son V.

doi:https://doi.org/10.25358/openscience-13516

Improving small molecules activity modelling capability of cell painting data through data augmentation and effective representation learning

dc.contributor.advisor	Czodrowski, Paul
dc.contributor.author	Ha, Son V.
dc.date.accessioned	2025-11-05T13:47:04Z
dc.date.issued	2024
dc.description.abstract	This thesis focuses on improving image-based activity modeling, for early-stage drug discov ery through data augmentation and representation learning of Cell Painting data. Firstly, a significant contribution is the introduction of the FSL-CP dataset, designed to support few-shot learning (FSL) benchmarking of small-molecule bioactivity prediction using cell microscopy images. Through this dataset we compared several FSL paradigms in a low-data context and study the effectiveness of transfer learning. Additionally, this work proposes an application for underused ‘low concentration images’ in activity modeling. We propose the combination of well-performing models trained at higher image concentrations, with lower image concentration for inference to identify more potent compounds. We show that this approach improves on the conventional method (directly training a high-potency model) in 65% of assays investigated in terms of AUC-ROC, and 75% of assays in terms of RIPtoP-corrected AUC-PR. The thesis further investigates cross-modality representation learning of cell painting (CP) and transcriptomics (TX), which are powerful tools in early drug discovery to gain understanding of the biological effect of compounds on a population of cells post-treatment. In this work, we benchmark two representation learning methods: contrastive learning and bimodal autoencoder. We use the setting of cross modality learning where representation learning is performed with two modalities (CP and TX), but only cell painting is available for new compounds embedding generation and downstream task. This is because for new compounds, we would only have CP data and not TX, due to high data generation cost of the RNA-Seq screen. We show that learned representation improves cluster quality for clustering of CP replicates and different modes of action (MoA). clustering of CP replicates and different modes of action (MoA).	en
dc.identifier.doi	https://doi.org/10.25358/openscience-13516
dc.identifier.uri	https://openscience.ub.uni-mainz.de/handle/20.500.12030/13537
dc.identifier.urn	urn:nbn:de:hebis:77-3733c68f-0879-4d6b-9af1-72566ae6dce49
dc.language.iso	eng
dc.rights	CC-BY-ND-4.0
dc.rights.uri	https://creativecommons.org/licenses/by-nd/4.0/
dc.subject.ddc	500 Naturwissenschaften	de
dc.subject.ddc	500 Natural sciences and mathematics	en
dc.subject.ddc	540 Chemie	de
dc.subject.ddc	540 Chemistry and allied sciences	en
dc.subject.ddc	660 Technische Chemie	de
dc.subject.ddc	660 Chemical engineering	en
dc.subject.ddc	004 Informatik	de
dc.subject.ddc	004 Data processing	en
dc.title	Improving small molecules activity modelling capability of cell painting data through data augmentation and effective representation learning	en
dc.type	Dissertation
jgu.date.accepted	2025-10-02
jgu.description.extent	xx, 113 Seiten ; Illustrationen, Diagramme
jgu.identifier.uuid	3733c68f-0879-4d6b-9af1-72566ae6dce4
jgu.organisation.department	FB 09 Chemie, Pharmazie u. Geowissensch.
jgu.organisation.name	Johannes Gutenberg-Universität Mainz
jgu.organisation.number	7950
jgu.organisation.place	Mainz
jgu.organisation.ror	https://ror.org/023b0x485
jgu.organisation.year	2024
jgu.rights.accessrights	openAccess
jgu.subject.ddccode	500
jgu.subject.ddccode	540
jgu.subject.ddccode	660
jgu.subject.ddccode	004
jgu.type.dinitype	PhDThesis	en_GB
jgu.type.resource	Text
jgu.type.version	Original work

Files

Original bundle

Now showing 1 - 1 of 1

Name:: improving_small_molecules_act-20251105144704952547.pdf
Size:: 7.23 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 5.14 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

JGU-Hochschulschriften