Validation strategies for the detection of m5C by nanopore direct RNA sequencing

Loading...
Thumbnail Image

Date issued

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Reuse License

Description of rights: CC-BY-4.0
Item type: Item , DissertationAccess status: Open Access ,

Abstract

RNA modifications play essential roles in gene expression, yet their reliable detection at single-site resolution remains a significant analytical challenge. Among these modifications, 5-methylcytidine (m5C) is particularly difficult to detect due to its subtle ionic current perturbation and typically low stoichiometry. This challenge persists even with nanopore direct RNA sequencing (DRS), the current cutting-edge platform for native RNA analysis. Despite advances in deep learning-based approaches, existing methods generate false-positive rates that preclude reliable site-specific interpretation. This thesis describes ModiCal (ModiDeC-based Calibration), a calibration-driven framework that repurposes a neural network classifier into a high-precision m5C validation tool. The work proceeded through two phases. In the first, different splint-mediated ligation strategies for generating site-specific modified RNAs were explored, including a chemical capping protocol for potential application in modification detection. Short ligation constructs were confirmed to be fully compatible with nanopore DRS. Attempted validation of putative 2′-O-methylation sites on SARS-CoV-2 transcripts by RiboMethSeq revealed systematic false-positive signals that were indistinguishable from genuine modification, exposing a fundamental limitation of cleavage-based detection and driving the transition to nanopore-based approaches. In the second phase, the ModiCal workflow was established using yeast 25S ribosomal RNA. A three-step calibration strategy was implemented: baseline training on synthetic ground-truth RNA, bulk false-positive suppression using in vitro transcribed reference RNA, and iterative single-site refinement. This approach reframes false-positive predictions as diagnostic readouts of misaligned decision boundaries. The calibrated model achieved background-free, enzyme-dependent detection of both characterized m5C sites on yeast rRNA. Applied to dengue virus genomic RNA, it confirmed a low-stoichiometry m5C site with zero false positives across the entire viral genome. Quantitative accuracy was further validated by stoichiometry titration. ModiCal establishes a map-and-validate paradigm in which discovery tools nominate candidates and calibration-based validation provides site-specific confirmation. The principles demonstrated here are modification-agnostic and applicable beyond m5C.

Description

Keywords

Citation

Relationships

Endorsement

Review

Supplemented By

Referenced By