--- title: "curatedTCGAData" date: "`r BiocStyle::doc_date()`" vignette: | %\VignetteIndexEntry{curatedTCGAData} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: toc_float: true Package: curatedTCGAData --- ```{r, echo = FALSE, warning = FALSE} suppressPackageStartupMessages({ library(MultiAssayExperiment) library(curatedTCGAData) library(TCGAutils) }) ``` # Installation ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("curatedTCGAData") ``` Load packages: ```{r, eval = FALSE} library(curatedTCGAData) library(MultiAssayExperiment) library(TCGAutils) ``` # Downloading datasets Checking available cancer codes and assays in TCGA data: ```{r} curatedTCGAData(diseaseCode = "*", assays = "*", dry.run = TRUE) ``` Check potential files to be downloaded: ```{r} curatedTCGAData(diseaseCode = "COAD", assays = "RPPA*", dry.run = TRUE) ``` ## ACC dataset example ```{r, message=FALSE} (accmae <- curatedTCGAData("ACC", c("CN*", "Mutation"), FALSE)) ``` **Note**. For more on how to use a `MultiAssayExperiment` please see the `MultiAssayExperiment` vignette. ### Subtype information Some cancer datasets contain associated subtype information within the clinical datasets provided. This subtype information is included in the metadata of `colData` of the `MultiAssayExperiment` object. To obtain these variable names, use the `getSubtypeMap` function from TCGA utils: ```{r} head(getSubtypeMap(accmae)) ``` ### Typical clinical variables Another helper function provided by TCGAutils allows users to obtain a set of consistent clinical variable names across several cancer types. Use the `getClinicalNames` function to obtain a character vector of common clinical variables such as vital status, years to birth, days to death, etc. ```{r} head(getClinicalNames("ACC")) colData(accmae)[, getClinicalNames("ACC")][1:5, 1:5] ``` ### Samples in Assays The `sampleTables` function gives an overview of sample types / codes present in the data: ```{r} sampleTables(accmae) ``` Often, an analysis is performed comparing two groups of samples to each other. To facilitate the separation of samples, the `splitAssays` TCGAutils function identifies all sample types in the assays and moves each into its own assay. By default, all discoverable sample types are separated into a separate experiment. In this case we requested only solid tumors and blood derived normal samples as seen in the `sampleTypes` reference dataset: ```{r} sampleTypes[sampleTypes[["Code"]] %in% c("01", "10"), ] splitAssays(accmae, c("01", "10")) ```