Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://doi.org/10.25358/openscience-5712
Autoren: Andreani, Tommaso
Titel: From DNA sequences to cell types by detecting regulatory genomic regions in sequencing data
Online-Publikationsdatum: 29-Mär-2021
Sprache des Dokuments: Englisch
Zusammenfassung/Abstract: One of the big questions in biology today is to understand which genetic and epigenetic factors are involved in the regulation of gene expression, and in which cases their deregulation can contribute to the development of abnormal phenotypes or diseases. Innovations in genome sequencing techniques and corresponding data processing algorithms have enabled unbiased interrogation of the different genomic and epigenomic components of transcription at nucleotide resolution. Therefore, it is now possible to use and integrate different types of data for both bulk and single-cell samples, and to understand the molecular components of gene expression regulation using ad-hoc reproducible computational analysis. As an interdisciplinary field, bioinformatics takes advantage of different quantitative disciplines, such as statistics and machine learning. This allows the implementation of detailed analyses to support and elucidate specific fundamental discoveries, and also to test unexpected predictions coming from exploratory data analysis. In particular, the use of bioinformatics is a necessity in the study of the genomic basis of gene regulation given the complexity of the data produced. Thus, the application of existing and the development of novel bioinformatics methods improves the interpretation of new data by integrating several data types from multiple sources. In this thesis I applied and developed bioinformatics methods to help investigate basic biological questions in the genomic study of epigenetic gene regulation: i) I created a pipeline for whole-genome bisulfite sequencing data analysis to improve the understanding of the way genes and DNA sequences are demethylated by GADD45 proteins and how this might be linked to a key stage of development in mouse embryonic stem cells (mESCs), ii) I developed a metric based on the Gini index to evaluate unsupervised clustering results obtained using several computational methods that were tested to identify various types of peripheral blood mononuclear cells (PBMCs) from single-cell ATAC-seq samples in which the labels of the cells were not provided and iii) I developed an algorithm to extract variable regions in ChIP-seq data that can improve the identification of target-specific binding sites of different proteins in several cell lines of the ENCODE project. Together, these three studies are a significant contribution to the improvement of the interpretation of genomic data for the study of epigenetic gene regulation by bioinformatics.
DDC-Sachgruppe: 500 Naturwissenschaften
500 Natural sciences and mathematics
570 Biowissenschaften
570 Life sciences
Veröffentlichende Institution: Johannes Gutenberg-Universität Mainz
Organisationseinheit: FB 10 Biologie
Externe Einrichtungen
Veröffentlichungsort: Mainz
DOI: http://doi.org/10.25358/openscience-5712
Version: Original work
Publikationstyp: Dissertation
Nutzungsrechte: in Copyright
Informationen zu den Nutzungsrechten: http://rightsstatements.org/vocab/InC/1.0/
Umfang: 167 Seiten, Illustrationen
Enthalten in den Sammlungen:JGU-Publikationen

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
andreani_tommaso-from_dna_seque-20210323164932284.pdf3.22 MBAdobe PDFÖffnen/Anzeigen