Please use this identifier to cite or link to this item: http://doi.org/10.25358/openscience-9719
Authors: Sethi, Riccha
Advisor: Sahin, Ugur
Hankeln, Thomas
Title: Identification of structural variations from whole genome sequencing of cancer patients
Online publication date: 18-Dec-2023
Year of first publication: 2023
Language: english
Abstract: Cancer is largely driven by accumulation of somatic mutations that can be subdivided into small mutations (single nucleotide variations (SNVs), small insertions and deletions) and large structural variations (SVs). While SNVs affect single nucleotide, SVs can affect large stretches of DNA. Reliable identification of all mutations is key to understanding genetic diseases like cancer. SVs can be identified by whole genome sequencing with conventional Illumina short-read sequencing (cWGS) being the most widely used approach. However, reliable prediction of SVs with short-reads (50-150bp) from fragmented DNA (~0.5kb) is challenging due to ambiguous mapping reads at repetitive regions and typically only few short reads span rearranged SV breakpoints with limited sequence overlap (due to read length). The 10X Genomics linked- reads sequencing (10XWGS) technology aims to mitigate limitations by linking short-reads to the original larger fragment of DNA (~10kb). In this study, we performed an unbiased evaluation of these two technologies with different types and sizes of SVs and compared their performance. The SVs commonly identified by both the technologies were highly specific, while the validation rate dropped for uncommon SVs. Despite the technological advantage, a particularly high false discovery rate (FDR) was observed for SVs found only by 10XWGS without any significant improvement in sensitivity. We proposed a sensitive and specific statistical approach to improve SV predictions from both technologies and characterized SVs from MCF7 breast cancer cell line and a primary breast tumor with high precision. Due to the limited benefit of 10XWGS for sensitivity, we trained a random forest classifier in FuseSV for accurate predictions only from cWGS sequencing data. FuseSV integrates SV predictions from multiple bioinformatics tools and mitigates high FDR of cWGS with a novel set of features derived from alignment of reads to the reference genome, biological mechanisms of SVs and breakpoints of SVs clustered together to consider complex genomic rearrangements (CGRs). The performance of FuseSV classifiers was superior to all individual bioinformatics tools as well as combined use with 10XWGS. SVs whether simple or complex can form chimeric fusion transcripts (CMTs). CMTs can be predicted from RNA-sequencing (RNA-seq) data but include also transcripts that occur without underlying mutation and are also present in healthy tissues. Here we propose a novel pipeline, FUdGE, that predict three types of CMT directly from somatic SVs: These include direct fusion transcripts or classical fusion genes, transcripts with intron (IR) and intergenic region retained (INR). FUdGE allows independent confirmation of expressed CMTs from matched RNA-seq data. We validated the approach in the same MCF7 cell line and a primary breast tumor sample and investigate CMTs in a cohort of liposarcoma samples. Here we observed that the majority of confirmed SV driven CMTs were classical fusion genes with a much smaller number of IR and INR events. Conclusively, FuseSV enables accurate prediction of somatic SVs in cancer using only cWGS. While FUdGE provides an RNA-seq independent strategy for direct prediction of CMTs formed due to somatic SV event. The respective expressed CMT candidates can be confirmed independently with RNA-seq data. This alternative approach only predicts tumor-specific somatic SV driven CMTs, which is advantageous for personalized immunotherapy interventions considering CMTs as neo-antigen candidates.
DDC: 570 Biowissenschaften
570 Life sciences
600 Technik
600 Technology (Applied sciences)
Institution: Johannes Gutenberg-Universität Mainz
Department: FB 10 Biologie
Place: Mainz
ROR: https://ror.org/023b0x485
DOI: http://doi.org/10.25358/openscience-9719
URN: urn:nbn:de:hebis:77-openscience-bc295932-464c-43bf-ae5a-9532158b1ad47
Version: Original work
Publication type: Dissertation
License: CC BY-ND
Information on rights of use: https://creativecommons.org/licenses/by-nd/4.0/
Extent: XII, 91 Seiten ; Illustrationen, Diagramme
Appears in collections:JGU-Publikationen

Files in This Item:
  File Description SizeFormat
Thumbnail
identification_of_structural_-20231129130638218.pdfComplete thesis13.03 MBAdobe PDFView/Open