CAREx : context-aware read extension of paired-end sequencing data

dc.contributor.authorKallenborn, Felix
dc.contributor.authorSchmidt, Bertil
dc.date.accessioned2024-09-05T14:14:42Z
dc.date.available2024-09-05T14:14:42Z
dc.date.issued2024
dc.description.abstractBackground: Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads. Results: We present CAREx—an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to 99% for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools. Conclusion: CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at (https://github.com/fkallen/CAREx).en_GB
dc.identifier.doihttp://doi.org/10.25358/openscience-10675
dc.identifier.urihttps://openscience.ub.uni-mainz.de/handle/20.500.12030/10693
dc.language.isoengde
dc.rightsCC-BY-4.0*
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/*
dc.subject.ddc004 Informatikde_DE
dc.subject.ddc004 Data processingen_GB
dc.titleCAREx : context-aware read extension of paired-end sequencing dataen_GB
dc.typeZeitschriftenaufsatzde
jgu.journal.titleBMC bioinformaticsde
jgu.journal.volume25de
jgu.organisation.departmentFB 08 Physik, Mathematik u. Informatikde
jgu.organisation.nameJohannes Gutenberg-Universität Mainz
jgu.organisation.number7940
jgu.organisation.placeMainz
jgu.organisation.rorhttps://ror.org/023b0x485
jgu.pages.alternative186de
jgu.publisher.doi10.1186/s12859-024-05802-wde
jgu.publisher.issn1471-2105de
jgu.publisher.nameBioMed Centralde
jgu.publisher.placeLondonde
jgu.publisher.year2024
jgu.rights.accessrightsopenAccess
jgu.subject.ddccode004de
jgu.subject.dfgIngenieurwissenschaftende
jgu.type.contenttypeOtherde
jgu.type.dinitypeArticleen_GB
jgu.type.resourceTextde
jgu.type.versionPublished versionde

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
carex___contextaware_read_ext-20240905152110551.pdf
Size:
1.53 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.57 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections