--- output: BiocStyle::html_document: toc: true toc_depth: 2 package: TENET.ExperimentHub title: "Using the TENET.ExperimentHub datasets" author: Rhie Lab at the University of Southern California date: "`r Sys.Date()`" abstract: > This vignette describes the basic usage of the TENET.ExperimentHub package, which contains datasets for use in the TENET package's vignette and function examples. These include a variety of different objects to illustrate different datasets used in TENET functions. See our GitHub repository () for more information. Where applicable, all datasets are aligned to the hg38 human genome. vignette: > %\VignetteIndexEntry{Using the TENET.ExperimentHub datasets} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} \usepackage[utf8]{inputenc} --- \RaggedRight ```{r echo = FALSE, message = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4, tibble.print_max = 4, max.print = 4) ``` # Introduction The TENET.ExperimentHub package contains 6 datasets for use in the TENET package's examples and vignettes. These datasets include an example MultiAssayExperiment object with matched gene expression and DNA methylation data from a subset of both tumor (case) and adjacent normal (control) samples in The Cancer Genome Atlas (TCGA)'s breast adenocarcinoma (BRCA) cohort with essential information used in all TENET functions, an example GRanges object produced by the TENET `step1MakeExternalDatasets` function, a SummarizedExperiment object with example purity data to pass to the TENET `step2GetDifferentiallyMethylatedSites` function, a data frame with example patient clinical data (matching the data in the example MultiAssayExperiment object), and two additional GRanges objects containing example peak and topologically associating domain (TAD) data, respectively. Where applicable, all datasets are aligned to the hg38 human genome. # Acquiring and installing TENET.ExperimentHub R 4.5 or a newer version is required. On Ubuntu 22.04, successful installation required several additional packages. They can be installed by running the following command in a terminal: `sudo apt-get install r-base-dev libcurl4-openssl-dev libfreetype6-dev libfribidi-dev libfontconfig1-dev libharfbuzz-dev libtiff5-dev libxml2-dev` No dependencies other than R are required on macOS or Windows. Two versions of this package are available. To install the stable version from Bioconductor, start R and run: ```{r eval = FALSE} ## Install BiocManager, which is required to install packages from Bioconductor if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install(version = "devel") BiocManager::install("TENET.ExperimentHub") ``` The development version containing the most recent updates is available from our GitHub repository (). To install the development version from GitHub, start R and run: ```{r eval = FALSE} ## Install prerequisite packages to install the development version from GitHub if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } if (!requireNamespace("remotes", quietly = TRUE)) { install.packages("remotes") } BiocManager::install(version = "devel") BiocManager::install("rhielab/TENET.ExperimentHub") ``` # Loading TENET.ExperimentHub To load the TENET.ExperimentHub package, start R and run: ```{r message = FALSE} library(TENET.ExperimentHub) ``` # Using the included datasets Wrapper functions are provided to allow easy access to all included datasets. Usage of each wrapper function is demonstrated below. # Included datasets ## `exampleTENETMultiAssayExperiment` A MultiAssayExperiment dataset created using a modified version of the `TCGADownloader` function from the TENET package utilizing TCGAbiolinks package functionality. This object contains two SummarizedExperiment objects, `expression` and `methylation`, with expression data for 11,637 genes annotated to the GENCODE v36 dataset, including all 1,637 identified human TF genes, and DNA methylation data for 20,000 probes from the Illumina HM450 methylation array. The data are aligned to the human hg38 genome. Expression and methylation values were matched from 200 tumor and 42 adjacent normal tissue samples subset from the TCGA BRCA dataset. Additionally, results from running the TENET step 1-6 functions on these samples are included in the metadata of this MultiAssayExperiment object. Clinical data for these samples are included in the colData of the MultiAssayExperiment object. (A separate data frame object containing a subset of the clinical data for these samples is available as `exampleTENETClinicalDataFrame`.) This dataset is included to demonstrate TENET functions. Note: Because this dataset is a small subset of the overall BRCA dataset, results generated by TENET from this dataset differ from those presented for the BRCA dataset at large in TENET publications. ```{r} ## Retrieve the ExperimentHub metadata for the object exampleTENETMultiAssayExperiment(metadata = TRUE) ## Retrieve the object itself exampleTENETMultiAssayExperiment() ``` ## `exampleTENETClinicalDataFrame` A data frame containing example and simulated clinical information corresponding to the samples in the `exampleTENETMultiAssayExperiment` object, used to demonstrate how TENET functions can import clinical data from a specified data frame. Clinical data are utilized by the `step2GetDifferentiallyMethylatedSites`, `step7TopGenesSurvival`, and `step7ExpressionVsDNAMethylationScatterplots` functions. The data frame consists of vital status and time variables for use by the `step7TopGenesSurvival` function, simulated purity data for each sample, and simulated copy number variation (CNV) and somatic mutation (SM) data for the top 10 genes by number of linked hypermethylated and hypomethylated probes derived from analyses done using the `exampleTENETMultiAssayExperiment` object. These data are a subset of the clinical data contained in the colData of the `exampleTENETMultiAssayExperiment` object. ```{r} ## Retrieve the ExperimentHub metadata for the object exampleTENETClinicalDataFrame(metadata = TRUE) ## Retrieve the object itself exampleTENETClinicalDataFrame() ``` ## `exampleTENETStep1MakeExternalDatasetsGRanges` A GenomicRanges dataset representing putative enhancer regions relevant to BRCA, created using the `step1MakeExternalDatasets` function in the TENET package with the `consensusEnhancer`, `consensusNDR`, `publicEnhancer`, `publicNDR`, and `ENCODEdELS` arguments all set to TRUE, and the `cancerType` argument set to "BRCA". The data are aligned to the human hg38 genome. This dataset is included to demonstrate TENET's `step2GetDifferentiallyMethylatedSites` function. ```{r} ## Retrieve the ExperimentHub metadata for the object exampleTENETStep1MakeExternalDatasetsGRanges(metadata = TRUE) ## Retrieve the object itself exampleTENETStep1MakeExternalDatasetsGRanges() ``` ## `exampleTENETStep2GetDifferentiallyMethylatedSitesPuritySummarizedExperiment` SummarizedExperiment object A SummarizedExperiment object with three DNA methylation datasets each composed of 10 adjacent normal colorectal adenocarcinoma (COAD) samples from The Cancer Genome Atlas (TCGA), retrieved using the TCGAbiolinks package. Each dataset has data for 20,000 probes from the Illumina HM450 methylation array, to match the number of probes in the `exampleTENETMultiAssayExperiment` object. The data are aligned to the human hg38 genome. This object is representative of a `purity` dataset, which would contain DNA methylation data from potentially confounding sources, used with TENET's `step2GetDifferentiallyMethylatedSites` function. ```{r} ## Retrieve the ExperimentHub metadata for the object exampleTENETStep2GetDifferentiallyMethylatedSitesPuritySummarizedExperiment( metadata = TRUE ) ## Retrieve the object itself exampleTENETStep2GetDifferentiallyMethylatedSitesPuritySummarizedExperiment() ``` ## `exampleTENETPeakRegions` A GenomicRanges dataset with example genomic regions (peaks) of interest, used to demonstrate TENET's `step7TopGenesUserPeakOverlap` function. The peaks are derived from a ChIP-seq experiment on FOXA1 in MCF-7 cells and aligned to the human hg38 genome. They were downloaded from the ENCODE portal (file ENCFF112JVK in experiment ENCSR126YEB). **Citation:** ENCODE Project Consortium; Moore JE, Purcaro MJ, Pratt HE, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020 Jul;583(7818):699-710. doi: 10.1038/s41586-020-2493-4. Epub 2020 Jul 29. Erratum in: Nature. 2022 May;605(7909):E3. PMID: 32728249; PMCID: PMC7410828. ```{r} ## Retrieve the ExperimentHub metadata for the object exampleTENETPeakRegions(metadata = TRUE) ## Retrieve the object itself exampleTENETPeakRegions() ``` ## `exampleTENETTADRegions` A GenomicRanges dataset with example topologically associating domains (TADs), used to demonstrate TENET's `step7TopGenesTADTables` function. The TADs are derived from T47D cells (mistakenly labeled as 'T470'), and aligned to the human hg38 genome. They were downloaded from the 3D Genome Browser at . **Citation:** Wang Y, Song F, Zhang B, et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 2018 Oct 4;19(1):151. doi: 10.1186/s13059-018-1519-9. PMID: 30286773; PMCID: PMC6172833. ```{r} ## Retrieve the ExperimentHub metadata for the object exampleTENETTADRegions(metadata = TRUE) ## Retrieve the object itself exampleTENETTADRegions() ``` # Session info ```{r echo = FALSE, message = FALSE} ## Reset max.print to the default to ensure the full session info is shown options(max.print = 99999) ``` ```{r} sessionInfo() ```