RNA-Seq based decomposition of human cell lines and primary tumors for the identification and quantification of viral expression
Date issued
Authors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
License
Abstract
Around twenty percent of all cancer cases are contributed to infectious agents, mostly to viruses. However, detection of viruses can be laborious, time-consuming, or require foreknowledge. Next generation transcriptome sequencing (RNA-Seq) of human samples is unbiased in the way that all included mRNAs can be sequenced, including foreign mRNA like viral transcripts. The analysis of viral expression profiles and their influence on the host is fundamental to understanding virus-associated oncogenesis in human. The hereby-presented study represents a step towards systems virology in cancer immunology.
First, the software tool VirusID was developed for the qualified detection and identification of viruses in mammalian cells. VirusID was tested in benchmarks with known viral content and was slightly superior over other tools. It has since been in use for the identification of viral content in in-house and external sequencing RNA-Seq data at TRON and contributed to the TRON Cell Line Portal by delivering the identified viruses 1,082 cell lines.
Subsequently, identified virus alignment profiles in RNA-Seq data encouraged the development of VIRGENE, a pipeline to retrieve viral gene expression levels. We applied VIRGENE to 186 cell line samples and confirmed known viruses like papillomaviruses or herpesviruses. In other cell lines, yet unknown virus content was identified like traces of murine retroviruses or bovine polyomaviruses. VIRGENE was then applied to primary tumors and revealed distinct expression profiles for Epstein-Barr virus between cell lines and tumor samples. However, BamHI-A rightward transcripts (BARTs) are consistently expressed. Furthermore, different Human papillomavirus expression profiles were identified in cervical cancer, which were associated with distinct overall survival of the respective patients. The diagnostic and prognostic potential of these biomarkers will have to be further assessed.
This work shows the potential of screening existing RNA-Seq data for viruses and expressed viral genes. Diverse virus expression profiles have different effects on host gene expression and possibly also to disease outcome. The results produced by VIRGENE facilitate comprehensive studies of host and pathogen transcriptomic interplay. Prognostic, diagnostic, or therapeutic value is added by coupling the results with clinical annotation of cancer patients.