Please use this identifier to cite or link to this item:
Authors: Andreani, Tommaso
Title: From DNA sequences to cell types by detecting regulatory genomic regions in sequencing data
Online publication date: 29-Mar-2021
Language: english
Abstract: One of the big questions in biology today is to understand which genetic and epigenetic factors are involved in the regulation of gene expression, and in which cases their deregulation can contribute to the development of abnormal phenotypes or diseases. Innovations in genome sequencing techniques and corresponding data processing algorithms have enabled unbiased interrogation of the different genomic and epigenomic components of transcription at nucleotide resolution. Therefore, it is now possible to use and integrate different types of data for both bulk and single-cell samples, and to understand the molecular components of gene expression regulation using ad-hoc reproducible computational analysis. As an interdisciplinary field, bioinformatics takes advantage of different quantitative disciplines, such as statistics and machine learning. This allows the implementation of detailed analyses to support and elucidate specific fundamental discoveries, and also to test unexpected predictions coming from exploratory data analysis. In particular, the use of bioinformatics is a necessity in the study of the genomic basis of gene regulation given the complexity of the data produced. Thus, the application of existing and the development of novel bioinformatics methods improves the interpretation of new data by integrating several data types from multiple sources. In this thesis I applied and developed bioinformatics methods to help investigate basic biological questions in the genomic study of epigenetic gene regulation: i) I created a pipeline for whole-genome bisulfite sequencing data analysis to improve the understanding of the way genes and DNA sequences are demethylated by GADD45 proteins and how this might be linked to a key stage of development in mouse embryonic stem cells (mESCs), ii) I developed a metric based on the Gini index to evaluate unsupervised clustering results obtained using several computational methods that were tested to identify various types of peripheral blood mononuclear cells (PBMCs) from single-cell ATAC-seq samples in which the labels of the cells were not provided and iii) I developed an algorithm to extract variable regions in ChIP-seq data that can improve the identification of target-specific binding sites of different proteins in several cell lines of the ENCODE project. Together, these three studies are a significant contribution to the improvement of the interpretation of genomic data for the study of epigenetic gene regulation by bioinformatics.
DDC: 500 Naturwissenschaften
500 Natural sciences and mathematics
570 Biowissenschaften
570 Life sciences
Institution: Johannes Gutenberg-Universität Mainz
Department: FB 10 Biologie
Externe Einrichtungen
Place: Mainz
URN: urn:nbn:de:hebis:77-openscience-b7189d64-17a4-4d1d-9e6a-dc1175630a1e3
Version: Original work
Publication type: Dissertation
License: In Copyright
Information on rights of use:
Extent: 167 Seiten, Illustrationen
Appears in collections:JGU-Publikationen

Files in This Item:
  File Description SizeFormat
andreani_tommaso-from_dna_seque-20210323164932284.pdf3.22 MBAdobe PDFView/Open