Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines

dc.contributor.authorWeißbach, Stephan
dc.contributor.authorSys, Stanislav
dc.contributor.authorHewel, Charlotte
dc.contributor.authorTodorov, Hristo
dc.contributor.authorSchweiger, Susann
dc.contributor.authorWinter, Jennifer
dc.contributor.authorPfenninger, Markus
dc.contributor.authorTorkamani, Ali
dc.contributor.authorEvans, Doug
dc.contributor.authorBurger, Joachim
dc.contributor.authorEverschor-Sitte, Karin
dc.contributor.authorMay-Simera, Helen Louise
dc.contributor.authorGerber, Susanne
dc.date.accessioned2021-08-18T09:38:43Z
dc.date.available2021-08-18T09:38:43Z
dc.date.issued2021
dc.description.abstractBACKGROUND: Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform’s impact. RESULTS: The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups. CONCLUSION: We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.en_GB
dc.identifier.doihttp://doi.org/10.25358/openscience-6291
dc.identifier.urihttps://openscience.ub.uni-mainz.de/handle/20.500.12030/6301
dc.language.isoengde
dc.rightsCC-BY-4.0*
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/*
dc.subject.ddc570 Biowissenschaftende_DE
dc.subject.ddc570 Life sciencesen_GB
dc.subject.ddc610 Medizinde_DE
dc.subject.ddc610 Medical sciencesen_GB
dc.subject.ddc620 Ingenieurwissenschaften und Maschinenbaude_DE
dc.subject.ddc620 Engineering and allied operationsen_GB
dc.titleReliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelinesen_GB
dc.typeZeitschriftenaufsatzde
jgu.journal.titleBMC genomicsde
jgu.journal.volume22de
jgu.organisation.departmentFB 04 Medizinde
jgu.organisation.nameJohannes Gutenberg-Universität Mainz
jgu.organisation.number2700
jgu.organisation.placeMainz
jgu.organisation.rorhttps://ror.org/023b0x485
jgu.pages.alternative62de
jgu.publisher.doi10.1186/s12864-020-07362-8
jgu.publisher.issn1471-2164de
jgu.publisher.nameSpringerde
jgu.publisher.placeHeidelbergde
jgu.publisher.urihttps://doi.org/10.1186/s12864-020-07362-8de
jgu.publisher.year2021
jgu.rights.accessrightsopenAccess
jgu.subject.ddccode570de
jgu.subject.ddccode610de
jgu.subject.ddccode620de
jgu.type.dinitypeArticleen_GB
jgu.type.resourceTextde
jgu.type.versionPublished versionde

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
weißbach_stephan-reliability_of-20210816165702329.pdf
Size:
3.05 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.57 KB
Format:
Item-specific license agreed upon to submission
Description: