\documentclass[a4paper, 9pt]{article} <>= BiocStyle::latex() @ % \VignetteIndexEntry{An R Package for TRanslational ONCOlogy} \usepackage{hyperref} \usepackage{amsmath, amsthm, amssymb} \usepackage{xfrac} \usepackage{fullpage} \usepackage{marginnote} \usepackage{graphicx} \usepackage[table]{xcolor} %http://ctan.org/pkg/xcolor \usepackage[numbers]{natbib} \usepackage{algorithmic} \usepackage{algorithm} \usepackage{url} \usepackage{placeins} \usepackage{xspace} \newcommand{\CAPRESE}{\textsc{CAPRESE}} \newcommand{\TRONCO}{\textsc{TRONCO}} \usepackage{fullpage} \begin{document} \title{Using the \TRONCO{} package} \author{ Marco Antoniotti\footnote{Dipartimento di Informatica Sistemistica e Comunicazione, Universit\'a degli Studi Milano Bicocca Milano, Italy.} \and Giulio Caravagna$^\ast$ \and Luca De Sano$^\ast$ \and Alex Graudenzi$^\ast$ \and Ilya Korsunsky\footnote{Courant Institute of Mathematical Sciences, New York University, New York, USA.} \and Mattia Longoni$^\ast$ \and Loes Olde Loohuis\footnote{Center for Neurobehavioral Genetics, University of California, Los Angeles, USA.} \and Giancarlo Mauri$^\ast$ \and Bud Mishra$^\dagger$ \and Daniele Ramazzotti$^\ast$ } \date{\today} \maketitle \begin{center} \begin{minipage}[h]{0.75\textwidth} \textbf{Abstract.} Genotype-level {\em cancer progression models} describe the temporal ordering in which genomic alterations such as somatic mutations and copy number alterations tend to fixate and accumulate during cancer formation and progression. These graphical models can describe trends of \textit{natural selection} across a population of independent tumour samples (cross-sectional data), or reconstruct the clonal evolution in a single patient's tumour (multi-region or single-cell data). In terms of application, such models can be used to better elucidate genotype-phenotype relation, predict cancer hallmarks and outcome of personalised treatment as well as suggest novel targets for therapy design. \\ \TRONCO{} ({\sc TR}{\em anslational} {\sc ONCO}{\em logy}) is an \textsc{R} package which collects algorithms to infer progression models from Bernoulli 0/1 profiles of genomic alterations across a tumor sample. Such profiles are usually visualized as a binary input matrix where each row represents a patient's sample (e.g., the result of a sequenced tumor biopsy), and each column an event relevant to the progression (a certain type of somatic mutation, a focal or higher-level chromosomal copy number alteration, etc.); a 0/1 value models the absence/presence of that alteration in the sample. TRONCO provides utilities to readily import these data and in particular the import from boolean matrices and MAF or GISTIC files are supported. The package provides various functions to editing, visualize and subset such data, as well as functions to query the cBioPortal for cancer genomics. \\ In the current version, TRONCO provides parallel implementations of CAPRESE [PLoS ONE 9(12): e115570] and CAPRI [Bioinformatics, doi:10.1093/bioinformatics/btv296] algorithms to infer progression models arranged as trees or general direct acyclic graphs. Bootstrap procedures to assess the non-prametric and statistical confidence of the inferred models are also provided. The package comes with example data available, which include the dataset of Atypical Chronic Myeloid Leukemia samples by Piazza et al., Nat. Genet., 45 (2013). \end{minipage} \end{center} \vspace{1.0cm} \SweaveOpts{concordance=TRUE} \paragraph{\large Requirements: } You need to have installed the R package \texttt{rgraphviz}, see \texttt{Bioconductor.org}. In this vigntte, we will present a case study for the usage of the TRONCO package based on the work presented in the main \textit{CAPRI} paper. \paragraph{\large Events selection}{\ }\\ We will start by loading the TRONCO package in \textsc{R} along with an example \textit{"dataset"} that comes within the package. <<>>= library(TRONCO) data(aCML) hide.progress.bar <<- TRUE @ We then use the function \texttt{show} to get a short summary of the aCML dataset that has just been loaded. <<>>= show(aCML) @ Using the function \texttt{as.events}, we can have a look at the events in the dataset. <<>>= as.events(aCML) @ These events account for alterations in the following genes. <<>>= as.genes(aCML) @ Now we take a look at the alterations of only the gene \texttt{SETBP1} across the samples. <<>>= as.gene(aCML, genes='SETBP1') @ We consider a subset of all the genes in the dataset to be involved in patters based on the support we found in the literature. See the main \textit{CAPRI} paper as a reference. <<>>= gene.hypotheses = c('KRAS', 'NRAS', 'IDH1', 'IDH2', 'TET2', 'SF3B1', 'ASXL1') @ Regardless from which types of mutations we include, we select only the genes which appear alterated at least in the $5\%$ of the patients. Thus, we first transform the dataset into \textit{"Alteration"} (i.e., by collapsing all the event types for the same gene), and then we consider only the these events from the original dataset. <<>>= alterations = events.selection(as.alterations(aCML), filter.freq = .05) @ We now show a plot of the selected genes. Note that this plot has no title as by default the function \texttt{events.selection} does not add any. <<>>= dummy = oncoprint(alterations,font.row=12,cellheight=20,cellwidth=4) @ <>= capture.output(oncoprint(alterations,font.row=12,cellheight=20,cellwidth=4, file='onco-1.pdf'), file='NUL') @ \incfig[ht]{onco-1}{0.9\textwidth}{Oncoprint output}{} \FloatBarrier \paragraph{\large Adding Hypotheses}{\ }\\ We now create the \texttt{dataset} to be used for the inference of the progression model. We consider the original dataset and from it we select all the genes whose mutations are occurring at least the $5\%$ of the times together with any gene involved in an hypothesis. To do so, we use the parameter \texttt{filter.in.names} as shown below. <<>>= hypo = events.selection(aCML, filter.in.names=c(as.genes(alterations), gene.hypotheses)) hypo = annotate.description(hypo, 'CAPRI - Bionformatics aCML data (selected events)') @ We show a new oncoprint of this latest dataset where we annotate the genes in \texttt{gene.hypotheses} in order to identify them. The sample names are also shown. <<>>= dummy = oncoprint(hypo, gene.annot = list(priors= gene.hypotheses), sample.id = T, font.row=12, font.column=5, cellheight=20, cellwidth=4) @ <>= capture.output(oncoprint(hypo, gene.annot = list(priors= gene.hypotheses), sample.id = T, font.row=12, font.column=5, cellheight=20, cellwidth=4, file='onco-2.pdf'), file='NUL') @ \incfig[ht]{onco-2}{0.9\textwidth}{Oncoprint output}{} \FloatBarrier We now also add the hypotheses that are described in CAPRI's manuscript. Hypothesis of hard exclusivity (XOR) for NRAS/KRAS events (Mutation). This hypothesis is tested against all the events in the dataset. <<>>= hypo = hypothesis.add(hypo, 'NRAS xor KRAS', XOR('NRAS', 'KRAS')) @ We then try to include also a soft exclusivity (OR) pattern but, since its \textit{"signature"} is the same of the hard one just included, it will not be included. The code below is expected to rise an error. <>= hypo = hypothesis.add(hypo, 'NRAS or KRAS', OR('NRAS', 'KRAS')) @ For the sake of better highlighting the perfect (hard) exclusivity among NRAS/KRAS mutations, one can have a further look at their alterations. <<>>= dummy = oncoprint(events.selection(hypo, filter.in.names = c('KRAS', 'NRAS')), font.row=12,cellheight=20, cellwidth=4) @ <>= capture.output(oncoprint(events.selection(hypo, filter.in.names = c('KRAS', 'NRAS')),font.row=12,cellheight=20, cellwidth=4, file='onco-3.pdf'), file='NUL') @ \incfig[ht]{onco-3}{0.9\textwidth}{Oncoprint output}{} \FloatBarrier We repeated the same analysis as before for other hypotheses and for the same reasons, we will include only the hard exclusivity pattern. <<>>= hypo = hypothesis.add(hypo, 'SF3B1 xor ASXL1', XOR('SF3B1', OR('ASXL1')), '*') @ <>= hypo = hypothesis.add(hypo, 'SF3B1 or ASXL1', OR('SF3B1', OR('ASXL1')), '*') @ Finally, we now do the same for genes TET2 and IDH2. In this case $3$ events for the gene TET2 are present, that is \textit{"Ins/Del"}, \textit{"Missense point"} and \textit{"Nonsense point"}. For this reason, since we are not specifying any subset of such events to be considered, all TET2 alterations are used. Since the events present a perfect hard exclusivity, their patters will be included as a $XOR$. <<>>= as.events(hypo, genes = 'TET2') hypo = hypothesis.add(hypo, 'TET2 xor IDH2', XOR('TET2', 'IDH2'), '*') hypo = hypothesis.add(hypo, 'TET2 or IDH2', OR('TET2', 'IDH2'), '*') @ <<>>= dummy = oncoprint(events.selection(hypo, filter.in.names = c('TET2', 'IDH2')), font.row=12, cellheight=20,cellwidth=4) @ <>= capture.output(oncoprint(events.selection(hypo, filter.in.names = c('TET2', 'IDH2')),font.row=12, cellheight=20,cellwidth=4, file='onco-4.pdf'), file='NUL') @ \incfig[ht]{onco-4}{0.9\textwidth}{Oncoprint output}{} \FloatBarrier We now finally add any possible group of homologous events. For any gene having more than one event associated we also add a soft exclusivity pattern among them. <<>>= hypo = hypothesis.add.homologous(hypo) @ The final dataset that will be given as input to CAPRI is now finally shown. <<>>= dummy = oncoprint(hypo, gene.annot = list(priors= gene.hypotheses), sample.id = T, font.row=10, font.column=5, cellheight=15, cellwidth=4) @ <>= capture.output(oncoprint(hypo, gene.annot = list(priors= gene.hypotheses), sample.id = T, font.row=10, font.column=5, cellheight=15, cellwidth=4, file='onco-5.pdf'), file='NUL') @ \incfig[ht]{onco-5}{0.9\textwidth}{Oncoprint output}{} \FloatBarrier \paragraph{\large Model reconstruction}{\ }\\ We run the inference of the model by CAPRI algorithm with its default parameter: we use both AIC and BIC as regularizators, Hill-climbing as heuristic search of the solutions and exhaustive bootstrap (nboot replicates or more for Wilcoxon testing, i.e., more iterations can be performed if samples are rejected), p-value set at 0.05. We set the seed for the sake of reproducibility. <<>>= model = tronco.capri(hypo, boot.seed = 12345, nboot=10) @ We then plot the model inferred by CAPRI with BIC as a regolarizator and we set some parameters to get a good plot; the confidence of each edge is shown both in terms of temporal priority and probability raising (selective advantage scores) and hypergeometric testing (statistical relevance of the dataset of input). <>= tronco.plot(model, fontsize = 13, scale.nodes = .6, regularization="bic", confidence = c('tp', 'pr', 'hg'), height.logic = 0.25, legend.cex = .5, pathways = list(priors= gene.hypotheses), label.edge.size=5) @ \incfig[ht]{vignette-figplot}{0.9\textwidth}{aCML Reconstructed model} {Pre bootstrap.} \FloatBarrier \paragraph{\large Bootstrapping data}{\ }\\ Finally, we perform non-parametric bootstrap as a further estimation of the confidence in the inferred results. <<>>= model.boot = tronco.bootstrap(model, nboot=10) @ <>= tronco.plot(model.boot, fontsize = 13, scale.nodes = .6, regularization="bic", confidence=c('npb'), height.logic = 0.25, legend.cex = .5, pathways = list(priors= gene.hypotheses), label.edge.size=10) @ \incfig[ht]{vignette-figplotboot}{0.9\textwidth}{aCML Reconstructed model} {After bootstrap.} \end{document}