From DNA sequences to cell types by detecting regulatory genomic regions in sequencing data

Andreani, Tommaso

doi:http://doi.org/10.25358/openscience-5712

From DNA sequences to cell types by detecting regulatory genomic regions in sequencing data

Files

andreani_tommaso-from_dna_seque-20210323164932284.pdf (3.15 MB)

Date issued

2021

Authors

Andreani, Tommaso

Reuse License

Description of rights: InC-1.0

Item

Dissertation

Open Access

Abstract

One of the big questions in biology today is to understand which genetic and epigenetic factors are involved in the regulation of gene expression, and in which cases their deregulation can contribute to the development of abnormal phenotypes or diseases. Innovations in genome sequencing techniques and corresponding data processing algorithms have enabled unbiased interrogation of the different genomic and epigenomic components of transcription at nucleotide resolution. Therefore, it is now possible to use and integrate different types of data for both bulk and single-cell samples, and to understand the molecular components of gene expression regulation using ad-hoc reproducible computational analysis. As an interdisciplinary field, bioinformatics takes advantage of different quantitative disciplines, such as statistics and machine learning. This allows the implementation of detailed analyses to support and elucidate specific fundamental discoveries, and also to test unexpected predictions coming from exploratory data analysis. In particular, the use of bioinformatics is a necessity in the study of the genomic basis of gene regulation given the complexity of the data produced. Thus, the application of existing and the development of novel bioinformatics methods improves the interpretation of new data by integrating several data types from multiple sources. In this thesis I applied and developed bioinformatics methods to help investigate basic biological questions in the genomic study of epigenetic gene regulation: i) I created a pipeline for whole-genome bisulfite sequencing data analysis to improve the understanding of the way genes and DNA sequences are demethylated by GADD45 proteins and how this might be linked to a key stage of development in mouse embryonic stem cells (mESCs), ii) I developed a metric based on the Gini index to evaluate unsupervised clustering results obtained using several computational methods that were tested to identify various types of peripheral blood mononuclear cells (PBMCs) from single-cell ATAC-seq samples in which the labels of the cells were not provided and iii) I developed an algorithm to extract variable regions in ChIP-seq data that can improve the identification of target-specific binding sites of different proteins in several cell lines of the ENCODE project. Together, these three studies are a significant contribution to the improvement of the interpretation of genomic data for the study of epigenetic gene regulation by bioinformatics.

DOI

http://doi.org/10.25358/openscience-5712

URI

https://openscience.ub.uni-mainz.de/handle/20.500.12030/5721

Collections

JGU-Hochschulschriften

Full item page

From DNA sequences to cell types by detecting regulatory genomic regions in sequencing data

Files

Date issued

Authors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Reuse License

Abstract

DOI

Description

Keywords

Citation

URI

Relationships

Collections

Endorsement

Review

Supplemented By

Referenced By