Machine learning-assisted identification of factors affecting variability in multi-omics data

Lipnitskaya, Sofya

doi:https://doi.org/10.25358/openscience-11585

Machine learning-assisted identification of factors affecting variability in multi-omics data

Files

machine_learningassisted_iden-2025031013491227695.pdf (6.01 MB)

Date issued

2024

Authors

Lipnitskaya, Sofya

Reuse License

Description of rights: CC-BY-4.0

Item

Dissertation

Open Access

Abstract

Recent advances in high-throughput technologies together with computational innovations have enabled the studying of biological systems at multiple levels, giving rise to integrative omics approaches. Multi-omics research refers to efforts that combine multiple omics datasets—including genes, transcripts, and proteins—obtained from the same samples to improve our understanding of biological processes. Over the past decades, omics technologies have led to new insights on complex molecular mechanisms underlying abnormal phenotypes and diseases, thus revolutionizing biomedical and biological research. This has resulted in the generation of a large volume of biological data, including that available in open-access sources. Nonetheless, comprehensive analysis of such data is not trivial and is particularly hampered by high dimensionality, noisy nature of the data, as well as the lack of standardized data analysis methods and pipelines. Therefore, it is necessary to focus on the integration of the omics data in the context of phenotypes and conditions of interest, which motivated the current research. This thesis investigates factors affecting biological and technical variability in the context of transcriptomics studies by applying Machine Learning (ML) and Integrative Data Analysis (IDA). In particular, the thesis proposes design and implementation of: (I) a bioinformatics pipeline (FAVSeq) for identification of key effectors for variation in multimodal RNA Sequencing (RNA-Seq) profiles from matched bulk and single-cell experiments and (II) an analysis tool for ML- and IDA-based studying of alternative splicing regulome (regulAS) comprising large-scale RNA-Seq from cancer and healthy patients from public omics data sources. Findings and tools presented in this thesis provide a basis for further experimental investigations of identified factors, as well as subsequent improvements at the level of RNA-Seq data preparation along with downstream analysis that allow to facilitate the fundamental research and biomedical applications based on RNA sequencing technologies.

DOI

https://doi.org/10.25358/openscience-11585

URI

https://openscience.ub.uni-mainz.de/handle/20.500.12030/11606

Collections

JGU-Hochschulschriften

Full item page

Machine learning-assisted identification of factors affecting variability in multi-omics data

Files

Date issued

Authors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Reuse License

Abstract

DOI

Description

Keywords

Citation

URI

Relationships

Collections

Endorsement

Review

Supplemented By

Referenced By