Please use this identifier to cite or link to this item: https://ahro.austin.org.au/austinjspui/handle/1/30894
Title: Removing unwanted variation from large-scale RNA sequencing data with PRPS.
Austin Authors: Molania, Ramyar;Foroutan, Momeneh;Gagnon-Bartsch, Johann A;Gandolfo, Luke C;Jain, Aryan;Sinha, Abhishek;Olshansky, Gavriel;Dobrovic, Alexander ;Papenfuss, Anthony T;Speed, Terence P
Affiliation: Surgery (University of Melbourne)
Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. molania.r@wehi.edu.au.. Department of Medical Biology, The University of Melbourne, Melbourne, Victoria, Australia. molania.r@wehi.edu.au..
Biomedicine Discovery Institute and the Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria, Australia
Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
Department of Medical Biology, The University of Melbourne, Melbourne, Victoria, Australia
School of Mathematics and Statistics, The University of Melbourne, Melbourne, Victoria, Australia
Department of Economics and Statistics, Monash University, Melbourne, Victoria, Australia
Metabolomics Laboratory, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
Baker Department of Cardiometabolic Health, The University of Melbourne, Melbourne, Victoria, Australia
Department of Statistics, University of Michigan, Ann Arbor, Ann Arbor, MI, USA
Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. papenfuss@wehi.edu.au.. Department of Medical Biology, The University of Melbourne, Melbourne, Victoria, Australia. papenfuss@wehi.edu.au.. Peter MacCallum Cancer Centre, Melbourne, VIC, Australia. papenfuss@wehi.edu.au.. Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, Victoria, Australia. papenfuss@wehi.edu.au..
Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. terry@wehi.edu.au.. School of Mathematics and Statistics, The University of Melbourne, Melbourne, Victoria, Australia. terry@wehi.edu.au..
Issue Date: 2023
Date: 2022
Publication information: Nature Biotechnology 2023; 41(1)
Abstract: Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
URI: https://ahro.austin.org.au/austinjspui/handle/1/30894
DOI: 10.1038/s41587-022-01440-w
ORCID: http://orcid.org/0000-0002-3599-2455
http://orcid.org/0000-0003-4928-8060
http://orcid.org/0000-0001-8404-354X
http://orcid.org/0000-0002-1102-8506
http://orcid.org/0000-0002-5403-7998
Journal: Nature Biotechnology
PubMed URL: 36109686
Type: Journal Article
Appears in Collections:Journal articles

Show full item record

Page view(s)

44
checked on Nov 19, 2024

Google ScholarTM

Check


Items in AHRO are protected by copyright, with all rights reserved, unless otherwise indicated.