Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://doi.org/10.25358/openscience-6291
Autoren: Weißbach, Stephan
Sys, Stanislav
Hewel, Charlotte
Todorov, Hristo
Schweiger, Susann
Winter, Jennifer
Pfenninger, Markus
Torkamani, Ali
Evans, Doug
Burger, Joachim
Everschor-Sitte, Karin
May-Simera, Helen Louise
Gerber, Susanne
Titel: Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines
Online-Publikationsdatum: 18-Aug-2021
Erscheinungsdatum: 2021
Sprache des Dokuments: Englisch
Zusammenfassung/Abstract: BACKGROUND: Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform’s impact. RESULTS: The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups. CONCLUSION: We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.
DDC-Sachgruppe: 570 Biowissenschaften
570 Life sciences
610 Medizin
610 Medical sciences
620 Ingenieurwissenschaften und Maschinenbau
620 Engineering and allied operations
Veröffentlichende Institution: Johannes Gutenberg-Universität Mainz
Organisationseinheit: FB 04 Medizin
Veröffentlichungsort: Mainz
ROR: https://ror.org/023b0x485
DOI: http://doi.org/10.25358/openscience-6291
Version: Published version
Publikationstyp: Zeitschriftenaufsatz
Nutzungsrechte: CC BY
Informationen zu den Nutzungsrechten: https://creativecommons.org/licenses/by/4.0/
Zeitschrift: BMC genomics
22
Seitenzahl oder Artikelnummer: 62
Verlag: Springer
Verlagsort: Heidelberg
Erscheinungsdatum: 2021
ISSN: 1471-2164
URL der Originalveröffentlichung: https://doi.org/10.1186/s12864-020-07362-8
DOI der Originalveröffentlichung: 10.1186/s12864-020-07362-8
Enthalten in den Sammlungen:JGU-Publikationen

Dateien zu dieser Ressource:
  Datei Beschreibung GrößeFormat
Miniaturbild
weißbach_stephan-reliability_of-20210816165702329.pdf3.12 MBAdobe PDFÖffnen/Anzeigen