%\VignetteIndexEntry{coMET users guide} %\VignetteDepends{coMET} %\VignetteKeywords{Software, Visualization, genomeAnnotation, GeneRegulation, GenomicVariation, DNAMethylation, SNP, GeneExpression } %\VignettePackage{coMET} %\VignetteEngine{knitr::knitr} \documentclass[11pt]{article} % A bunch of styles and package requirements for the Bioconductor vignette branding <>= #library("BiocStyle") BiocStyle::latex() @ <>= library(knitr) opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold') options(replace.assign=TRUE,width=90) @ \RequirePackage[utf8]{inputenc} % \RequirePackage{hyperref} \RequirePackage{url} \RequirePackage[authoryear,round]{natbib} \bibliographystyle{plainnat} % \RequirePackage[text={7.2in,9in},centering]{geometry} %\setkeys{Gin}{width=0.95\textwidth} \RequirePackage{longtable} \RequirePackage{graphicx} \newcommand{\code}[1]{{\texttt{#1}}} \newcommand{\term}[1]{{\emph{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}} \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\mgg}[0]{\Rpackage{coMET} } \newcommand{\Reference}[1]{{\texttt{#1}}} \newcommand{\link}[1]{{#1}} \newcommand{\RR}[0]{{\texttt{R}}} \title{The coMET User Guide} \author{Tiphaine Martin \footnote{tiphaine.martin@kcl.ac.uk}, Idil Yet \footnote{idil.yet@kcl.ac.uk}, Pei-Chien Tsai \footnote{peichien.tsai@kcl.ac.uk}, Jordana T. Bell \footnote{jordana.bell@kcl.ac.uk}} \date{Edited: September 2014; Compiled: \today} \begin{document} \maketitle \section{Citation} <>= citation(package='coMET') @ \clearpage \tableofcontents \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} The CoMET package is a web-based plotting tool and R-based package to visualize omic-WAS results in a genomic region of interest, such as EWAS (epigenome-wide association scan). CoMET provides a plot of the EWAS association signal and visualisation of the methylation correlation between CpG sites (co-methylation). The CoMET package also provides the option to annotate the region using functional genomic information, including both user-defined features and pre-selected features based on the Encode project. The plot can be customized with different parameters, such as plot labels, colours, symbols, heatmap colour scheme, significance thresholds, and including reference CpG sites. Finally, the tool can also be applied to display the correlation patterns of other genomic data in any species, e.g. gene expression array data. coMET generates a multi-panel plot to visualize EWAS results, co-methylation patterns, and annotation tracks in a genomic region of interest. A coMET figure (cf. Fig. 1) includes three components: \begin{enumerate} \item the upper plot shows the strength and extent of EWAS association signal; \item the middle panel provides customized annotation tracks; \item the lower panel shows the correlation between selected CpG sites in the genomic region. \end{enumerate} The structure of the plots builds on snp.plotter (Luna et al., 2007), with extensions to incorporate genomic annotation tracks and customized functions. coMET produces plots in PDF and Encapsulated Postscript (EPS) format. The current version of coMET can visualise EWAS results and annotations from a genomic region up to an entire chromosome in the upper and middle panels of the coMET plot. However, the lower panel (co-methylation) is restricted to visualising a maximum of 120 single-CpG or region-based datapoints. This limitation is due to limitations in the size of a standard A4 plot, and may be updated in the near future. However, the user can use the function comet.list to extracts all significant correlations beyond a given threshold in the dataset from either a genomic region or from an entire chromosome if required. \section{Usage} CoMET requires the installation of R, the statistical computing software, freely available for Linux, Windows, or MacOS. CoMET can be downloaded from bioconductor. Packages can be installed using the install.packages command in R. The coMET R package includes two major functions \textbf{\emph{comet.web}} and \textbf{\emph{comet}} to visualise omci-WAS results. \begin{itemize} \item The function \textbf{\emph{comet.web}} generates output plot with the same settings of genomic annotation tracks as that of the webservice (\url{http://epigen.kcl.ac.uk/comet} or direcly \url{http://comet.epigen.kcl.ac.uk:3838/coMET/}). \item The function \textbf{\emph{comet}} generates output plots with the customized annotation tracks defined by user. \end{itemize} <>= source("http://bioconductor.org/biocLite.R") biocLite("coMET") @ Currently, coMET is under the development version of Bioconductor, go to the section "Install he development version of coMET from Bioconductor". After loading from Bioconductor or gitHUB, CoMET can be loaded into a R session using this command: <>= require("hash") require("grid") require("grDevices") require("biomaRt") require("Gviz") require("ggbio") require("rtracklayer") require("GenomicRanges") require("colortools") require("gridExtra") require("ggplot2") require("trackViewer") require("psych") rdir <- system.file("R", package="coMET",mustWork=TRUE) source(file.path(rdir, "AnalyseFile.R")) source(file.path(rdir, "BiofeatureGraphics.R")) source(file.path(rdir, "comet.R")) source(file.path(rdir, "cometWeb.R")) source(file.path(rdir, "DrawPlot.R")) source(file.path(rdir, "GeneralMethodComet.R")) @ <>= library("coMET") @ The configuration file specifies the options for the coMET plot. Example configuration and input files are also provided on \url{http://epigen.kcl.ac.uk/comet}. Information about the package can viewed from within R using this command: <>= ?comet ?comet.web ?comet.list @ \subsection{Install the development version of coMET from Bioconductor} To install coMET from the development version of Bioconductor, the user must install R-devel from \url{http://www.bioconductor.org/developers/how-to/useDevel/}. Following this installation, use standard Bioconductor command line, e.g. <>= source("http://bioconductor.org/biocLite.R") biocLite("coMET") @ \subsection{Install the version of coMET from gitHub} Another way to install coMET is to download the package from gitHUB \url{https://github.com/TiphaineCMartin/coMET} and use command line: <>= install.packages("YourPath/coMET_YourVersion.tar.gz",repos=NULL,type="source") ##This is an example install.packages("YourPath/coMET_0.99.9.tar.gz",repos=NULL,type="source") @ \section{Functions in coMET} Currently, there are 3 main functions: \begin{enumerate} \item \textbf{\emph{comet.web}} is the pre-customized function that allows us to visualise quickly EWAS (or other omic-WAS) results, annotation tracks, and correlations between features. This version is installed in the Shiny web-service. Currently, it is formated only to visualise human data. \item \textbf{\emph{comet}} is the generic function that allows us to visualise quickly EWAS results, annotation tracks, and correlations between features. Users can visualise more personalised annotation tracks and give multiple extra EWAS/omic-WAS results to plot. \item \textbf{\emph{comet.list}} is an additional function that allows us to extract the values of correlations, the pvalues, and estimates and confidence intervals for all datapoints that surpass a particular threshold. \end{enumerate} The functions can read the data input files, but it is also possible to use data frames within R for all data input except for the configuration file. The latter can be achieved with the two functions \textbf{\emph{comet}} and \textbf{\emph{comet.list}}. The structure of the data frames (number of columns, type, format) follows the same rules as for the data input files (cf. section "File formats"). \section{File formats} There are five types of files that the user can give to produce the plot: \begin{enumerate} \item Info file is defined in the option \textbf{\emph{mydata.file}}. \textcolor{red}{It is mandatory and has to be in tabular format with a header}. \item Correlation file is defined in the option \textbf{\emph{cormatrix.file}}. \textcolor{red}{It is mandatory and has to be in tabular format with a header}. \item Extra info files are defined in the option \textbf{\emph{mydata.file.large}}. \textcolor{red}{It is optional, and if provided has to be in tabular format with a header}. \item Annotation info file is defined in the option \textbf{\emph{biofeat.user.file}}. This option exists only in the function \textbf{\emph{comet.web}} and the user should inform also the format to visualise this data with the options textbf{\emph{biofeat.user.type}} and textbf{\emph{biofeat.user.type.plot}}. \item Configuration file contains the values of these options instead of defining these by command line. \textcolor{red}{Each line in the file is one option. The name of the option is in capital letters and is separated by its value by "="}. If there are multiple values such as for the option \textbf{\emph{list.tracks}} or the options for additional data, you need to separated them by a "comma". \end{enumerate} \subsection{Format of the info file (for option: \textbf{\emph{mydata.file}}, mandatory)} \textcolor{red}{This file is mandatory and has to be in tabular format with an header}. Info file can be a list of CpG sites with/without Beta value (DNA methylation level) or direction sign. If it is a site file then it is mandatory to have the 4 columns as shown below with headers in the same order. Beta can be the 5th column(optional) and it can be either a numeric value (positive or negative values) or only direction sign ("+", "-"). The number of columns and their types are defined but the option \textbf{\emph{mydata.format}}. <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) infofile <- file.path(extdata, "cyp1b1_infofile.txt") data_info <-read.csv(infofile, header = TRUE, sep = "\t", quote = "") head(data_info) @ Alternatively, the info file can be region-based and if so, the region-based info file must have the 5 columns (see below) with headers in this order. The beta or direction can be included in the 6th column (optional). <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) infoexp <- file.path(extdata, "cyp1b1_infofile_exprGene_region.txt") data_infoexp <-read.csv(infoexp, header = TRUE, sep = "\t", quote = "") head(data_infoexp) @ In summary, there are 4 possible formats for the info file: \begin{enumerate} \item \textbf{\emph{site}}: the file has only 4 columns with a header: \begin{enumerate} \item Name of omic feature \item Name of chromosome \item Position of omic feature \item P-value of omic feature \end{enumerate} \item \textbf{\emph{region}}: the file has only 5 columns with a header: \begin{enumerate} \item Name of omic feature \item Name of chromosome \item Start position of omic feature \item End position of omic feature \item P-value of omic feature \end{enumerate} \item \textbf{\emph{site\_asso}}: the file has only 5 columns with a header: \begin{enumerate} \item Name of omic feature \item Name of chromosome \item Position of omic feature \item P-value of omic feature \item Direction of association related to this omic feature. It can be the sign or actual value of association effect size. \end{enumerate} \item \textbf{\emph{region\_asso}}: the file has only 6 columns with a header: \begin{enumerate} \item Name of omic feature \item Name of chromosome \item Start position of omic feature \item End position of omic feature \item P-value of omic feature \item Direction of association related to this omic feature. It can be the sign or actual value of association effect size. \end{enumerate} \end{enumerate} \subsection{Format of correlation matrix (for option: \textbf{\emph{cormatrix.file}}, mandatory)} \textcolor{red}{This file is mandatory and has to be in tabular format with an header}. The data file used for the correlation matrix is described in the option \textbf{\emph{cormatrix.file}}. This tab-delimited file can take 3 formats described in the option \textbf{\emph{cormatrix.format}}: \begin{enumerate} \item \textbf{\emph{cormatrix}}: pre-computed correlation matrix provided by the user; Dimension of matrix : CpG\_number X CpG\_number. Need to put the CpG sites/regions in the ascending order of positions and to have a header with the name of CpG sites/regions; \item \textbf{\emph{raw}}: Raw data format. Correlations of these can be computed by one of 3 methods Spearman, Pearson, Kendall (option \textbf{\emph{cormatrix.method}}). Dimension of matrix : sample\_size X CpG\_number. Need to have a header with the name of CpG sites/regions ; \item \textbf{\emph{raw\_rev}}: Raw data format. Correlations of these can be computed by one of 3 methods Spearman, Pearson, Kendall (option \textbf{\emph{cormatrix.method}}). Dimension of matrix : CpG\_number X sample\_size. Need to have the row names of CpG sites/regions and a header with the name of samples ; \end{enumerate} <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) corfile <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") data_cor <-read.csv(corfile, header = TRUE, sep = "\t", quote = "") data_cor[1:6,1:6] @ \subsection{Format of extra info file (for option: \textbf{\emph{mydata.large.file}})} \textcolor{red}{This file is optional file and if provided has to be in tabular format with an header}. The extra info files can be described in the option \textbf{\emph{mydata.large.file}} and their format in \textbf{\emph{mydata.large.format}}. Different extra info files are separated by a comma. This can be another type of info file (e.g expression or replication data) and should follow the same rules as the standard info file. \subsection{Format of annotation file (for option \textbf{\emph{biofeat.user.file}})} The file is defined in the option \textbf{\emph{biofeat.user.file}} and the format of file is the format accepted by GViz (BED, GTF, and GFF3). \subsection{Option of config.file} \textcolor{red}{Each line in the file is one option. The name of the option is in capital letters and is separated by its value by "=". If there are multiple values such as for the option \textbf{\emph{list.tracks}} or options for additional data, these need to be separated them by a "comma"}. If you would like to make your own changes to the plot you can download the configuration file, make changes to it, and upload it into R as shown in the example below. The important options of a coMET figure include three components: \begin{enumerate} \item The upper plot shows the strength and extent of EWAS association signal. \begin{itemize} \item \textbf{\emph{pval.threshold}}: Significance threshold to be displayed as a red dashed line \item \textbf{\emph{disp.association}}: This logical option works only if \textbf{\emph{mydata.file}} contains the effect direction (\textbf{\emph{mydata.format}}=\textbf{\emph{site\_asso}} or \textbf{\emph{region\_asso}}). The value can be TRUE or FALSE: if FALSE (default), for each point of data in the p-value plot, the color of symbol is the color of co-methylation pattern between the point and the reference site; if TRUE, the effect direction is shown. If the association is positive, the color is the one defined with the option \textbf{\emph{color.list}}. On the other hand, if the association is negative, the color is the opposed color. \item disp.REGION : This logical option works only if \textbf{\emph{mydata.file}} contains regions (\textbf{\emph{mydata.format}}=\textbf{\emph{region}} or \textbf{\emph{region\_asso}}). The value can be TRUE or FALSE (default). If TRUE, the genomic element will be shown by a continuous line with the color of the element, in addition to the symbol at the center of the region. If FALSE, only the symbol is shown. \end{itemize} \item The middle panel provides customized annotation tracks; \begin{itemize} \item \textbf{\emph{list.tracks}} (for \emph{comet.web} function): List of annotation tracks that can be visualised: geneENSEMBL, CGI, ChromHMM, DNAse, RegENSEMBL, SNP, transcriptENSEMBL, SNPstoma, SNPstru, SNPstrustoma, ISCA, COSMIC, GAD, ClinVar, GeneReviews, GWAS, ClinVarCNV, GCcontent, genesUCSC, xenogenesUCSC. The elements are separated by a comma. \item tracks.gviz, tracks.ggbio, tracks.trackviewer (for \emph{comet} function): For each option, it is possible to give a list of annotation tracks that is created by the Gviz, GGBio, and TrackViewer bioconductor packages. \end{itemize} \item The lower panel shows the correlation between selected CpG sites in the genomic region. \begin{itemize} \item \textbf{\emph{cormatrix.format}} : Format of the input fie \textbf{\emph{cormatrix.file}}: either raw data (option RAW if CpG sites are by column and samples by row or option RAW\_REV if CpG site are by row and samples by column) or correlation matrix (option CORMATRIX) \item \textbf{\emph{cormatrix.method}} : If raw data are provided it will be necessary to produce the correlation matrix using one of 3 methods (spearman, pearson and kendall). \item \textbf{\emph{cormatrix.color.scheme}} : There are 5 colors (heat, bluewhitered, cm, topo, gray, bluetored) \end{itemize} \end{enumerate} <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) configfile <- file.path(extdata, "config_cyp1b1_zoom_4webserver.txt") data_config <-read.csv(configfile, quote = "", sep="\t", header=FALSE) data_config @ \section{Creating a plot like the webservice: comet.web} User can draw coMET via the coMET website (\url{http://epigen.kcl.ac.uk/comet}). It is possible to reproduce the web service plotting defaults by using the function comet.web, for example see Figure \ref{fig:cometweb_simple}. <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) myinfofile <- file.path(extdata, "cyp1b1_infofile.txt") myexpressfile <- file.path(extdata, "cyp1b1_infofile_exprGene_region.txt") mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") configfile <- file.path(extdata, "config_cyp1b1_zoom_4webserver.txt") comet.web(config.file=configfile, mydata.file=myinfofile, cormatrix.file=mycorrelation ,mydata.large.file=myexpressfile, print.image=FALSE,verbose=FALSE) @ \begin{figure} <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) myinfofile <- file.path(extdata, "cyp1b1_infofile.txt") myexpressfile <- file.path(extdata, "cyp1b1_infofile_exprGene_region.txt") mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") configfile <- file.path(extdata, "config_cyp1b1_zoom_4webserver.txt") comet.web(config.file=configfile, mydata.file=myinfofile, cormatrix.file=mycorrelation, mydata.large.file=myexpressfile, print.image=FALSE,verbose=FALSE) @ \caption{Plot with comet.web function.\label{fig:cometweb_simple}} \end{figure} \subsection{Hidden values of comet.web function} Hidden values of \textbf{\emph{comet.web}} function are shown in the section. If these values do not correspond to what you want to visualise, you need to use the function \textbf{\emph{comet}}, as a more generic option. \begin{longtable}{|c|c|} \hline \multicolumn{1}{|c|}{Option} & \multicolumn{1}{c|}{Value} \\ \hline \endfirsthead \multicolumn{2}{c}% {\tablename\ \thetable\ -- continued from previous page} \\ \hline \multicolumn{1}{|c|}{Option} & \multicolumn{1}{c|}{Value} \\ \hline \endhead \hline \multicolumn{2}{|r|}{{Continued on next page}} \\ \hline \endfoot \hline \hline \endlastfoot mydata.type & FILE \\ mydata.large.type & LISTFILE \\ cormatrix.type & LISTFILE \\ disp.cormatrixmap & TRUE\\ disp.pvalueplot & TRUE\\ disp.mydata.names & TRUE\\ disp.connecting.lines & TRUE\\ disp.mydata & TRUE\\ disp.type & symbol \\ biofeat.user.type.plot & histogram \\ tracks.gviz & NULL\\ tracks.ggbio & NULL\\ tracks.trackviewer & NULL\\ biofeat.user.file & NULL\\ palette.file & NULL\\ disp.color.bar & TRUE\\ disp.phys.dist & TRUE\\ disp.legend & TRUE\\ disp.marker.lines & TRUE\\ disp.mult.lab.X & FALSE\\ connecting.lines.factor & 1.5\\ connecting.lines.adj & 0.01\\ connecting.lines.vert.adj & -1\\ connecting.lines.flex & 0\\ font.factor & NULL\\ color.list & red \\ font.factor & NULL\\ dataset.gene & hsapiens\_gene\_ensembl \\ DATASET.SNP & hsapiens\_snp \\ VERSION.DBSNP & snp142Common \\ DATASET.SNP.STOMA & hsapiens\_snp\_som \\ DATASET.REGULATION & hsapiens\_feature\_set \\ DATASET.STRU & hsapiens\_structvar \\ DATASET.STRU.STOMA & hsapiens\_structvar\_som \\ BROWSER.SESSION & UCSC \\ \end{longtable} \section{Creating a plot with the generic function: comet} It is possible to create the annotation tracks by Gviz, trackviewer or ggbio, for example see Figure \ref{fig:cometPlotfile}. Currently, the Gviz option for annotation tracks, in combination with the heatmap of correlation values between genomic elements, provides the most informative and easy approach to visualize graphics. \subsection{coMET plot: pvalue plot, annotation tracks, and correlation matrix} \subsubsection{Input from data files} In this figure \ref{fig:cometPlotfile}, we create the different tracks outside to coMET with Gviz. The list of annotation tracks and different files are given to the function coMET. <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) configfile <- file.path(extdata, "config_cyp1b1_zoom_4comet.txt") myinfofile <- file.path(extdata, "cyp1b1_infofile.txt") myexpressfile <- file.path(extdata, "cyp1b1_infofile_exprGene_region.txt") mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") chrom <- "chr2" start <- 38290160 end <- 38303219 gen <- "hg19" strand <- "*" BROWSER.SESSION="UCSC" mySession <- browserSession(BROWSER.SESSION) genome(mySession) <- gen genetrack <-genesENSEMBL(gen,chrom,start,end,showId=TRUE) snptrack <- snpBiomart(chrom, start, end, dataset="hsapiens_snp_som",showId=FALSE) iscatrack <-ISCATrack(gen,chrom,start,end,mySession, table="iscaPathogenic") listgviz <- list(genetrack,snptrack,iscatrack) comet(config.file=configfile, mydata.file=myinfofile, mydata.type="file", cormatrix.file=mycorrelation, cormatrix.type="listfile", mydata.large.file=myexpressfile, mydata.large.type="listfile", tracks.gviz=listgviz, verbose=FALSE, print.image=FALSE) @ \begin{figure} <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) configfile <- file.path(extdata, "config_cyp1b1_zoom_4comet.txt") myinfofile <- file.path(extdata, "cyp1b1_infofile.txt") myexpressfile <- file.path(extdata, "cyp1b1_infofile_exprGene_region.txt") mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") #configfile <- "../inst/extdata/config_cyp1b1_zoom_4comet.txt" chrom <- "chr2" start <- 38290160 end <- 38303219 gen <- "hg19" strand <- "*" data(geneENSEMBLtrack) data(snpBiomarttrack) data(ISCAtrack) listgviz <- list(genetrack,snptrack,iscatrack) comet(config.file=configfile, mydata.file=myinfofile, mydata.type="file", cormatrix.file=mycorrelation, cormatrix.type="listfile", mydata.large.file=myexpressfile, mydata.large.type="listfile", tracks.gviz=listgviz, verbose=FALSE, print.image=FALSE) @ \caption{Plot with comet function from files.\label{fig:cometPlotfile}} \end{figure} \subsubsection{Input from a data frame} In this figure \ref{fig:cometPlotMatrix}, we visualize the same data as in figure \ref{fig:cometPlotfile}, but they are in data frame format and not read in from an input file. In addition, if the user would like to visualise only the correlations between CpG sites with P-value less than or equal to 0.05 in the upper plot, this option can be included. The correlations with a P-value greater than 0.05 can have a color "goshwhite" whereas the other correlations will be displayed using a color related to the correlation level. Conversely, in the P-value plot (upper plot), the points of each omic feature have their colors related to their correlations with the reference omic feature without taking into account the P-value associated with the correlation matrix. <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) configfile <- file.path(extdata, "config_cyp1b1_zoom_4comet.txt") myinfofile <- file.path(extdata, "cyp1b1_infofile.txt") myexpressfile <- file.path(extdata, "cyp1b1_infofile_exprGene_region.txt") mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") chrom <- "chr2" start <- 38290160 end <- 38303219 gen <- "hg19" strand <- "*" BROWSER.SESSION="UCSC" mySession <- browserSession(BROWSER.SESSION) genome(mySession) <- gen genetrack <-genesENSEMBL(gen,chrom,start,end,showId=TRUE) snptrack <- snpBiomart(chrom, start, end, dataset="hsapiens_snp_som",showId=FALSE) iscatrack <-ISCATrack(gen,chrom,start,end,mySession, table="iscaPathogenic") listgviz <- list(genetrack,snptrack,iscatrack) matrix.dnamethylation <- read.delim(myinfofile, header=TRUE, sep="\t", as.is=TRUE, blank.lines.skip = TRUE, fill=TRUE) matrix.expression <- read.delim(myexpressfile, header=TRUE, sep="\t", as.is=TRUE, blank.lines.skip = TRUE, fill=TRUE) cormatrix.data.raw <- read.delim(mycorrelation, sep="\t", header=TRUE, as.is=TRUE, blank.lines.skip = TRUE, fill=TRUE) listmatrix.expression <- list(matrix.expression) listcormatrix.data.raw <- list(cormatrix.data.raw) comet(config.file=configfile, mydata.file=matrix.dnamethylation, mydata.type="dataframe",cormatrix.file=listcormatrix.data.raw, cormatrix.type="listdataframe",cormatrix.sig.level=0.05, cormatrix.conf.level=0.05, cormatrix.adjust="BH", mydata.large.file=listmatrix.expression, mydata.large.type="listdataframe", tracks.gviz=listgviz,verbose=FALSE, print.image=FALSE) @ \begin{figure} <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) configfile <- file.path(extdata, "config_cyp1b1_zoom_4comet.txt") myinfofile <- file.path(extdata, "cyp1b1_infofile.txt") myexpressfile <- file.path(extdata, "cyp1b1_infofile_exprGene_region.txt") mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") #configfile <- "../inst/extdata/config_cyp1b1_zoom_4comet.txt" chrom <- "chr2" start <- 38290160 end <- 38303219 gen <- "hg19" strand <- "*" data(geneENSEMBLtrack) data(snpBiomarttrack) data(ISCAtrack) listgviz <- list(genetrack,snptrack,iscatrack) matrix.dnamethylation <- read.delim(myinfofile, header=TRUE, sep="\t", as.is=TRUE, blank.lines.skip = TRUE, fill=TRUE) matrix.expression <- read.delim(myexpressfile, header=TRUE, sep="\t", as.is=TRUE, blank.lines.skip = TRUE, fill=TRUE) cormatrix.data.raw <- read.delim(mycorrelation, sep="\t", header=TRUE, as.is=TRUE, blank.lines.skip = TRUE, fill=TRUE) listmatrix.expression <- list(matrix.expression) listcormatrix.data.raw <- list(cormatrix.data.raw) comet(config.file=configfile, mydata.file=matrix.dnamethylation, mydata.type="dataframe",cormatrix.file=listcormatrix.data.raw, cormatrix.type="listdataframe",cormatrix.sig.level=0.05, cormatrix.conf.level=0.05, cormatrix.adjust="BH", mydata.large.file=listmatrix.expression, mydata.large.type="listdataframe", tracks.gviz=listgviz,verbose=FALSE, print.image=FALSE) @ \caption{Plot with comet function from matrix data and with a pvalue threshold for the correlation between omics features (here CpG sites).\label{fig:cometPlotMatrix}} \end{figure} \subsection{coMET plot: annotation tracks and correlation matrix} It is possible to visualise only annotation tracks and the correlation between genetic elements. In this case, we need to use the option \texttt{disp.pvalueplot=FALSE}, for example see Figure \ref{fig:cometPlotNopval}. <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) configfile <- file.path(extdata, "config_cyp1b1_zoom_4cometnopval.txt") myinfofile <- file.path(extdata, "cyp1b1_infofile.txt") mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") chrom <- "chr2" start <- 38290160 end <- 38303219 gen <- "hg19" strand <- "*" genetrack <-genesENSEMBL(gen,chrom,start,end,showId=FALSE) snptrack <- snpBiomart(chrom, start, end, dataset="hsapiens_snp_som",showId=FALSE) strutrack <- structureBiomart(chrom, start, end, strand, dataset="hsapiens_structvar_som") clinVariant<-ClinVarMainTrack(gen,chrom,start,end) clinCNV<-ClinVarCnvTrack(gen,chrom,start,end) gwastrack <-GWASTrack(gen,chrom,start,end) geneRtrack <-GeneReviewsTrack(gen,chrom,start,end) listgviz <- list(genetrack,snptrack,strutrack,clinVariant, clinCNV,gwastrack,geneRtrack) comet(config.file=configfile, mydata.file=myinfofile, mydata.type="file", cormatrix.file=mycorrelation, cormatrix.type="listfile", tracks.gviz=listgviz, verbose=FALSE, print.image=FALSE,disp.pvalueplot=FALSE) @ \begin{figure} <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) configfile <- file.path(extdata, "config_cyp1b1_zoom_4cometnopval.txt") #configfile <- "../inst/extdata/config_cyp1b1_zoom_4comet.txt" myinfofile <- file.path(extdata, "cyp1b1_infofile.txt") mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") chrom <- "chr2" start <- 38290160 end <- 38303219 gen <- "hg19" strand <- "*" data(geneENSEMBLtrack) data(snpBiomarttrack) data(strucBiomarttrack) data(ClinVarCnvTrack) data(clinVarMaintrack) data(GWASTrack) data(GeneReviewTrack) listgviz <- list(genetrack,snptrack,strutrack,clinVariant, clinCNV,gwastrack,geneRtrack) comet(config.file=configfile, mydata.file=myinfofile, mydata.type="file", cormatrix.file=mycorrelation, cormatrix.type="listfile", tracks.gviz=listgviz, verbose=FALSE, print.image=FALSE,disp.pvalueplot=FALSE) @ \caption{Plot with comet function without pvalue plot.\label{fig:cometPlotNopval}} \end{figure} \section{Extract the significant correlations between omic features} CoMET can help to visualise the correlations between omic features with EWAS results and other omic data. In addition, a function \textbf{\emph{comet.list}} can extract the significant correlations according the method (\textbf{\emph{cormatrix.method}}) and significance level (\textbf{\emph{cormatrix.sig.level}}). The output file has 7 columns: \begin{enumerate} \item the name of the first omic feature \item the name of the second omic feature \item the correlation between the omic features \item the alpha/2 lower value (e.g. 0.05 (\textbf{\emph{cormatrix.conf.level}})) \item the alpha/2 upper value (e.g. 0.05 (\textbf{\emph{cormatrix.conf.level}})) \item the pvalue \item the pvalue adjusted with the method selected (e.g. Benjamin and Hochberg) (\textbf{\emph{cormatrix.adjust}}) \end{enumerate} <>= extdata <- system.file("extdata", package="coMET",mustWork=TRUE) mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt") myoutput <- file.path(extdata, "cyp1b1_res37_cormatrix_list_BH05.txt") comet.list(cormatrix.file=mycorrelation,cormatrix.method = "spearman", cormatrix.format= "raw", cormatrix.conf.level=0.05, cormatrix.sig.level= 0.05, cormatrix.adjust="BH", cormatrix.type = "listfile", cormatrix.output=myoutput, verbose=FALSE) listcorr <- read.csv(myoutput, header = TRUE, sep = "\t", quote = "") dim(listcorr) head(listcorr) @ \section{Extra information about annotation tracks} Annotation tracks can be created with Gviz using four different functions: \begin{enumerate} \item UcscTrack. Different UCSC tracks can be selected for visualisation from the table Browser of UCSC \url{http://genome-euro.ucsc.edu/cgi-bin/hgTables?hgsid=202842745_Dlvit14QO0G6ZPpLoEVABG8aqfrm&clade=mammal&org=Human&db=hg19&hgta_group=varRep&hgta_track=cpgIslandExt&hgta_table=0&hgta_regionType=genome&position=chr6%3A32726553-32727053&hgta_outputType=primaryTable&hgta_outFileName=} \item BiomartGeneRegionTrack. A connection should be established to the Biomart database to visualise the Genes data. \item DataTrack. This should be numerical format data \item AnnotationTrack. This allows visualisation of any annotation data. \end{enumerate} To have more information, it is better to read the Gviz's vignette. \subsection{Genes and transcripts from ENSEMBL and UCSC} The color of the genetic elements is defined by the R package Gviz. \subsection{Regulatory elements from ENSEMBL} The color of regulatory elements from ENSEMBL is defined from the same criteria as ENSEMBL (in 2014). The colors and the list of features can be updated in ENSEMBL and not yet in coMET. Please to contact us if you see a difference. Currently the colors are : \begin{longtable}{|c|c|} \hline \multicolumn{1}{|c|}{Omic feature} & \multicolumn{1}{c|}{Color} \\ \hline \endfirsthead \multicolumn{2}{c}% {\tablename\ \thetable\ -- continued from previous page} \\ \hline \multicolumn{1}{|c|}{Omic feature} & \multicolumn{1}{c|}{Color} \\ \hline \endhead \hline \multicolumn{2}{|r|}{{Continued on next page}} \\ \hline \endfoot \hline \hline \endlastfoot Promoter Associated & darkolivegreen\\ CTCF Binding Site & cadetblue1\\ Gene Associated & coral\\ Non-Gene Associated & darkgoldenrod1\\ Predicted Transcribed Region & greenyellow\\ PolIII Transcription Associated & purple\\ Enhancer & gold\\ Transcription Factor Binding Site & darkorchid1\\ Predicted Weak enhancer/Cis-reg element & yellow\\ Heterochromatin & wheat4\\ Open Chromatin & snow3\\ Promoter Flank & tomato\\ Repressed/Low Activity & snow4\\ Unclassified & aquamarine\\ \end{longtable} \subsection{ChromHMM from UCSC} The color of regulatory regions from UCSC are defined using the same criteria of UCSC in 2014. The colors and the list of features can be updated in UCSC and not yet in coMET. Please to contact us if you see a difference. Currently the colors are : \begin{longtable}{|c|c|} \hline \multicolumn{1}{|c|}{Omic feature} & \multicolumn{1}{c|}{Color} \\ \hline \endfirsthead \multicolumn{2}{c}% {\tablename\ \thetable\ -- continued from previous page} \\ \hline \multicolumn{1}{|c|}{Omic feature} & \multicolumn{1}{c|}{Color} \\ \hline \endhead \hline \multicolumn{2}{|r|}{{Continued on next page}} \\ \hline \endfoot \hline \hline \endlastfoot 1\_Active\_Promoter & firebrick1 \\ 2\_Weak\_Promoter & darksalmon \\ 3\_Poised\_Promoter & blueviolet \\ 4\_Strong\_Enhancer & Orange \\ 5\_Strong\_Enhancer & coral \\ 6\_Weak\_Enhancer & yellow \\ 7\_Weak\_Enhancer & gold \\ 8\_Insulator & cornflowerblue \\ 9\_Txn\_Transition & darkolivegreen \\ 10\_Txn\_Elongation & forestgreen \\ 11\_Weak\_Txn & darkseagreen1 \\ 12\_Repressed & gainsboro \\ 13\_Heterochrom/lo & gray74 \\ 14\_Repetitive/CNV & gray77 \\ 15\_Repetitive/CNV & gray86 \\ \end{longtable} \subsection{structureBiomart from Ensembl} These colors are for somatic structural variation and structural variation for any species. \begin{longtable}{|c|c|} \hline \multicolumn{1}{|c|}{Omic feature} & \multicolumn{1}{c|}{Color} \\ \hline \endfirsthead \multicolumn{2}{c}% {\tablename\ \thetable\ -- continued from previous page} \\ \hline \multicolumn{1}{|c|}{Omic feature} & \multicolumn{1}{c|}{Color} \\ \hline \endhead \hline \multicolumn{2}{|r|}{{Continued on next page}} \\ \hline \endfoot \hline \hline \endlastfoot copy\_number\_variation & cornsilk \\ inversion & darkolivegreen \\ translocation & cyan \\ sequence\_alteration & coral \\ snp & red \\ insertion & blueviolet \\ deletion & orange \\ indel & darkgoldenrod1 \\ substitution & dodgerblue2 \\ \end{longtable} \subsection{ISCA track} International Standards of Cytogenomic Arrays Consortium defined a set of phenotypes for CNVs. Different colors are defined to represent them. \begin{longtable}{|c|c|} \hline \multicolumn{1}{|c|}{Omic feature} & \multicolumn{1}{c|}{Color} \\ \hline \endfirsthead \multicolumn{2}{c}% {\tablename\ \thetable\ -- continued from previous page} \\ \hline \multicolumn{1}{|c|}{Omic feature} & \multicolumn{1}{c|}{Color} \\ \hline \endhead \hline \multicolumn{2}{|r|}{{Continued on next page}} \\ \hline \endfoot \hline \hline \endlastfoot iscaPathogeni & purple\\ iscaPathGainCum & red\\ iscaPathLossCum & blue\\ iscaCuratedPathogeni & purple\\ iscaLikelyPathogeni & lightpurple\\ iscaUncertain & lightgrey\\ iscaBenign & black\\ iscaCuratedBenign & black\\ iscaLikelyBenign & black\\ \end{longtable} \section{coMET: Shiny web-service} \subsection{How to use the coMET web-service} If you want to use coMET via its webservice, please go to \url{http://epigen.kcl.ac.uk/comet} and select one of different instances or direcly access one of the instances, for example \url{http://comet.epigen.kcl.ac.uk:3838/coMET/}. We have created different instances of coMET because we did not have access to the pro version of Shiny. All instances use the same version of coMET. If you use coMET from a Shiny webservice, you do not need to install the coMET package on your computer. The web service is user friendly and requires input files and configuration of the plot. The creation of the coMET plot can take some time because it makes a live connection to UCSC or/and ENSEMBL for the annotation tracks. First, the plot is created on the webpage, and then it can be saved as an output file. For better quality plots please use the download option and the plot will be recreated in a file in pdf or eps format. \subsection{How to install the coMET web-service} These are different steps to install coMET on your Shiny web-service and you need to be root to install it. \begin{enumerate} \item You can install an instance of Shiny \url{http://shiny.rstudio.com/}. \item You need to also install R, Bioconductor and the coMET package. \item In Shiny's folder (e.g. /var/shiny-server/www), you can create a folder called "COMET". \item Following this, you can install the two coMET scripts in www of the coMET package, within this new folder. \item You need to change owner and permissions to access this folder. Only the user called Shiny can access it. \begin{verbatim} chmod 755 /var/shiny-server/www/COMET chown shiny:shiny /var/shiny-server/www/COMET \end{verbatim} \item You need now to update the configuration file of Shiny (e.g. /etc/shiny-server/shiny-server.conf). \item You need to change owner and the permission to access this file \begin{verbatim} chmod 744 /etc/shiny-server/shiny-server.conf chown shiny:shiny /etc/shiny-server/shiny-server.conf \end{verbatim} \item At the end, you should restart the service Shiny via the command line: sudo restart shiny-server \end{enumerate} Your Shiny's configuration file: \begin{verbatim} run_as shiny; # Define a top-level server which will listen on a port server { # Instruct this server to listen on port 3838 listen 3838; # Define the location available at the base URL location / { # Run this location in 'site_dir' mode, which hosts the entire directory # tree at '/srv/shiny-server' site_dir /var/shiny-server/www; # Define where we should put the log files for this location log_dir /var/shiny-server/log; # Should we list the contents of a (non-Shiny-App) directory when the user # visits the corresponding URL? directory_index off; # app_init_timeout 3600; # app_idle_timeout 3600; } } \end{verbatim} \clearpage \section{SessionInfo} The following is the session info that generated this vignette: <>= toLatex(sessionInfo()) @ \end{document}