Please use this identifier to cite or link to this item: http://doi.org/10.25358/openscience-6351
Authors: Albrecht, Steffen
Sprang, Maximilian
Andrade, Miguel
Fontaine, Jean-Fred
Title: seqQscorer: automated quality control of next-generation sequencing data using machine learning
Online publication date: 20-Sep-2021
Year of first publication: 2021
Language: english
Abstract: Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer.
DDC: 004 Informatik
004 Data processing
570 Biowissenschaften
570 Life sciences
Institution: Johannes Gutenberg-Universität Mainz
Department: FB 10 Biologie
Place: Mainz
ROR: https://ror.org/023b0x485
DOI: http://doi.org/10.25358/openscience-6351
Version: Published version
Publication type: Zeitschriftenaufsatz
License: CC BY
Information on rights of use: https://creativecommons.org/licenses/by/4.0/
Journal: Genome biology
22
Pages or article number: 75
Publisher: BioMed Central
Publisher place: London
Issue date: 2021
ISSN: 1474-760X
Publisher URL: https://doi.org/10.1186/s13059-021-02294-2
Publisher DOI: 10.1186/s13059-021-02294-2
Annotation: Andrade, Miguel veröffentlicht unter: Andrade-Navarro, Miguel A.
Appears in collections:JGU-Publikationen

Files in This Item:
  File Description SizeFormat
Thumbnail
albrecht_steffen-seqqscorer__au-20210917111217183.pdf2 MBAdobe PDFView/Open