Improving small molecules activity modelling capability of cell painting data through data augmentation and effective representation learning

dc.contributor.advisorCzodrowski, Paul
dc.contributor.authorHa, Son V.
dc.date.accessioned2025-11-05T13:47:04Z
dc.date.issued2024
dc.description.abstractThis thesis focuses on improving image-based activity modeling, for early-stage drug discov ery through data augmentation and representation learning of Cell Painting data. Firstly, a significant contribution is the introduction of the FSL-CP dataset, designed to support few-shot learning (FSL) benchmarking of small-molecule bioactivity prediction using cell microscopy images. Through this dataset we compared several FSL paradigms in a low-data context and study the effectiveness of transfer learning. Additionally, this work proposes an application for underused ‘low concentration images’ in activity modeling. We propose the combination of well-performing models trained at higher image concentrations, with lower image concentration for inference to identify more potent compounds. We show that this approach improves on the conventional method (directly training a high-potency model) in 65% of assays investigated in terms of AUC-ROC, and 75% of assays in terms of RIPtoP-corrected AUC-PR. The thesis further investigates cross-modality representation learning of cell painting (CP) and transcriptomics (TX), which are powerful tools in early drug discovery to gain understanding of the biological effect of compounds on a population of cells post-treatment. In this work, we benchmark two representation learning methods: contrastive learning and bimodal autoencoder. We use the setting of cross modality learning where representation learning is performed with two modalities (CP and TX), but only cell painting is available for new compounds embedding generation and downstream task. This is because for new compounds, we would only have CP data and not TX, due to high data generation cost of the RNA-Seq screen. We show that learned representation improves cluster quality for clustering of CP replicates and different modes of action (MoA). clustering of CP replicates and different modes of action (MoA).en
dc.identifier.doihttps://doi.org/10.25358/openscience-13516
dc.identifier.urihttps://openscience.ub.uni-mainz.de/handle/20.500.12030/13537
dc.identifier.urnurn:nbn:de:hebis:77-3733c68f-0879-4d6b-9af1-72566ae6dce49
dc.language.isoeng
dc.rightsCC-BY-ND-4.0
dc.rights.urihttps://creativecommons.org/licenses/by-nd/4.0/
dc.subject.ddc500 Naturwissenschaftende
dc.subject.ddc500 Natural sciences and mathematicsen
dc.subject.ddc540 Chemiede
dc.subject.ddc540 Chemistry and allied sciencesen
dc.subject.ddc660 Technische Chemiede
dc.subject.ddc660 Chemical engineeringen
dc.subject.ddc004 Informatikde
dc.subject.ddc004 Data processingen
dc.titleImproving small molecules activity modelling capability of cell painting data through data augmentation and effective representation learningen
dc.typeDissertation
jgu.date.accepted2025-10-02
jgu.description.extentxx, 113 Seiten ; Illustrationen, Diagramme
jgu.identifier.uuid3733c68f-0879-4d6b-9af1-72566ae6dce4
jgu.organisation.departmentFB 09 Chemie, Pharmazie u. Geowissensch.
jgu.organisation.nameJohannes Gutenberg-Universität Mainz
jgu.organisation.number7950
jgu.organisation.placeMainz
jgu.organisation.rorhttps://ror.org/023b0x485
jgu.organisation.year2024
jgu.rights.accessrightsopenAccess
jgu.subject.ddccode500
jgu.subject.ddccode540
jgu.subject.ddccode660
jgu.subject.ddccode004
jgu.type.dinitypePhDThesisen_GB
jgu.type.resourceText
jgu.type.versionOriginal work

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
improving_small_molecules_act-20251105144704952547.pdf
Size:
7.23 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
5.14 KB
Format:
Item-specific license agreed upon to submission
Description: