Please use this identifier to cite or link to this item: http://doi.org/10.25358/openscience-7352
Authors: Wirawan, Adrianto
Harris, Robert S.
Liu, Yongchao
Schmidt, Bertil
Schröder, Jan
Title: HECTOR : a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
Online publication date: 11-Jul-2022
Year of first publication: 2014
Language: english
Abstract: BACKGROUND: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads. RESULTS: We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluated the performance of HECTOR, in terms of correction quality, runtime and parallel scalability, using both simulated and real pyrosequencing datasets. This performance has been further compared to that of Coral, a state-of-the-art error corrector which is based on multiple sequence alignment and Acacia, a recently published error corrector for amplicon pyrosequences. Our evaluations reveal that HECTOR demonstrates comparable correction quality to Coral, but runs 3.7x faster on average. In addition, HECTOR performs well even when the coverage of the dataset is low. CONCLUSION: Our homopolymer spectrum based approach is theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity. HECTOR employs a multi-threaded design based on a master-slave computing model. Our experimental results show that HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed. The source code and all simulated data are available at: http://hector454.sourceforge.net.
DDC: 004 Informatik
004 Data processing
Institution: Johannes Gutenberg-Universität Mainz
Department: FB 08 Physik, Mathematik u. Informatik
Place: Mainz
ROR: https://ror.org/023b0x485
DOI: http://doi.org/10.25358/openscience-7352
Version: Published version
Publication type: Zeitschriftenaufsatz
License: CC BY
Information on rights of use: https://creativecommons.org/licenses/by/2.0/
Journal: BMC bioinformatics
15
Pages or article number: Art. 131
Publisher: BioMed central
Publisher place: London
Issue date: 2014
ISSN: 1471-2105
Publisher URL: http://dx.doi.org/10.1186/1471-2105-15-131
Publisher DOI: 10.1186/1471-2105-15-131
Appears in collections:DFG-OA-Publizieren (2012 - 2017)

Files in This Item:
  File Description SizeFormat
Thumbnail
hector___a_parallel_multistag-20220710211016032.pdf1.02 MBAdobe PDFView/Open