HECTOR : a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data

dc.contributor.authorWirawan, Adrianto
dc.contributor.authorHarris, Robert S.
dc.contributor.authorLiu, Yongchao
dc.contributor.authorSchmidt, Bertil
dc.contributor.authorSchröder, Jan
dc.date.accessioned2022-07-11T10:28:05Z
dc.date.available2022-07-11T10:28:05Z
dc.date.issued2014
dc.description.abstractBACKGROUND: Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads. RESULTS: We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluated the performance of HECTOR, in terms of correction quality, runtime and parallel scalability, using both simulated and real pyrosequencing datasets. This performance has been further compared to that of Coral, a state-of-the-art error corrector which is based on multiple sequence alignment and Acacia, a recently published error corrector for amplicon pyrosequences. Our evaluations reveal that HECTOR demonstrates comparable correction quality to Coral, but runs 3.7x faster on average. In addition, HECTOR performs well even when the coverage of the dataset is low. CONCLUSION: Our homopolymer spectrum based approach is theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity. HECTOR employs a multi-threaded design based on a master-slave computing model. Our experimental results show that HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed. The source code and all simulated data are available at: http://hector454.sourceforge.net.en_GB
dc.description.sponsorshipDFG, Open Access-Publizieren Universität Mainz / Universitätsmedizinde
dc.identifier.doihttp://doi.org/10.25358/openscience-7352
dc.identifier.urihttps://openscience.ub.uni-mainz.de/handle/20.500.12030/7366
dc.language.isoengde
dc.rightsCC-BY-2.0*
dc.rights.urihttps://creativecommons.org/licenses/by/2.0/*
dc.subject.ddc004 Informatikde_DE
dc.subject.ddc004 Data processingen_GB
dc.titleHECTOR : a parallel multistage homopolymer spectrum based error corrector for 454 sequencing dataen_GB
dc.typeZeitschriftenaufsatzde
jgu.identifier.pmid24885381
jgu.journal.titleBMC bioinformaticsde
jgu.journal.volume15de
jgu.organisation.departmentFB 08 Physik, Mathematik u. Informatikde
jgu.organisation.nameJohannes Gutenberg-Universität Mainz
jgu.organisation.number7940
jgu.organisation.placeMainz
jgu.organisation.rorhttps://ror.org/023b0x485
jgu.pages.alternativeArt. 131de
jgu.publisher.doi10.1186/1471-2105-15-131de
jgu.publisher.issn1471-2105de
jgu.publisher.nameBioMed centralde
jgu.publisher.placeLondonde
jgu.publisher.urihttp://dx.doi.org/10.1186/1471-2105-15-131de
jgu.publisher.year2014
jgu.rights.accessrightsopenAccess
jgu.subject.ddccode004de
jgu.type.dinitypeArticleen_GB
jgu.type.resourceTextde
jgu.type.versionPublished versionde
opus.affiliatedWirawan, Adrianto
opus.affiliatedLiu, Yongchao
opus.affiliatedSchmidt, Bertil
opus.date.modified2018-08-08T07:52:17Z
opus.identifier.opusid26769
opus.importsourcepubmed
opus.institute.number0805
opus.metadataonlyfalse
opus.organisation.stringFB 08: Physik, Mathematik und Informatik: Institut für Informatikde_DE
opus.subject.dfgcode00-000
opus.type.contenttypeKeinede_DE
opus.type.contenttypeNoneen_EN

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
hector___a_parallel_multistag-20220710211016032.pdf
Size:
1018.23 KB
Format:
Adobe Portable Document Format
Description: