%\VignetteIndexEntry{OncoSimulR Overview} %\VignetteDepends{OncoSimulR} %\VignetteKeywords{OncoSimulR simulation cancer oncogenetic trees} %\VignettePackage{OncoSimulR} %\VignetteEngine{knitr::knitr} \documentclass[a4paper,11pt]{article} <>= require(knitr, quietly = TRUE) opts_knit$set(concordance = TRUE) ##opts_knit$set(stop_on_error = 2L) @ \usepackage{amsmath} %% \usepackage[authoryear,round,sort]{natbib} \usepackage{threeparttable} \usepackage{array} %%\usepackage{hyperref} %% not if using BiocStyle %%ditto %\usepackage{geometry} %\geometry{verbose,a4paper,tmargin=23mm,bmargin=26mm,lmargin=28mm,rmargin=28mm} \usepackage{url} \usepackage{xcolor} %\definecolor{light-gray}{gray}{0.72} \newcommand{\cyan}[1]{{\textcolor {cyan} {#1}}} \newcommand{\blu}[1]{{\textcolor {blue} {#1}}} \newcommand{\Burl}[1]{\blu{\url{#1}}} %%\SweaveOpts{echo=TRUE} %\usepackage{tikz} %\usetikzlibrary{arrows,shapes,positioning} \usepackage[latin1]{inputenc} \usepackage{gitinfo} %Uncomment for BioC %\usepackage{datetime} %\newdateformat{mydate}{\THEDAY-\monthname[\THEMONTH]-\THEYEAR} <>= BiocStyle::latex() @ %%\title{Using OncoSimulR: a package for simulating cancer progression data, %%including drivers and passengers, and allowing for order restrictions.} %%\author{Ramon Diaz-Uriarte\\ %%Dept. Biochemistry, Universidad Aut\'onoma de Madrid \\ %%Instituto de Investigaciones Biom\'edicas ``Alberto Sols'' (UAM-CSIC)\\ %%Madrid, Spain\\ %%{\small \texttt{ramon.diaz@iib.uam.es}} \\ %%{\small \texttt{rdiaz02@gmail.com}} \\ %%{\small \Burl{http://ligarto.org/rdiaz}} \\ %%} \bioctitle{Using OncoSimulR: a package for simulating cancer progression data, including drivers and passengers, and allowing for order restrictions.} \author{Ramon Diaz-Uriarte\\ Dept. Biochemistry, Universidad Aut\'onoma de Madrid \\ Instituto de Investigaciones Biom\'edicas ``Alberto Sols'' (UAM-CSIC)\\ Madrid, Spain{\footnote{ramon.diaz@iib.uam.es, rdiaz02@gmail.com}} \\ %% {\footnote{rdiaz02@gmail.com}} \\ {\small \Burl{http://ligarto.org/rdiaz}} \\ } %% \date{\the\year-\the\month-\the\day} %% \date{\mydate\today} \date{\gitAuthorDate\ {\footnotesize (Rev: \gitAbbrevHash)}} \begin{document} \maketitle %% Remember to add BiocStyle to Suggests %% %% I get lots of problems, so will try later. %% <>= %% BiocStyle::latex() %% @ \tableofcontents \section{Introduction} This vignette presents the OncoSimulR package. OncoSimulR allows you to simulate tumor progression using several models of tumor progression. In these simulations you can restrict the order in which mutations can accumulate. For instance, you can restrict the allowed order as specified, for instance, in Oncogenetic Tree (OT; \cite{Desper1999JCB, Szabo2008}) or Conjunctive Bayesian Network (CBN; \cite{Beerenwinkel2007, Gerstung2009, Gerstung2011}) models. Moreover, you can add passenger mutations to the simulations. The models so far implemented are all continuous time models, which are simulated using the BNB algorithm of Mather et al.\ \cite{Mather2012}. This is a summary of some of the key features: \begin{itemize} \item You can pass arbitrary restrictions as specified by OTs or CBNs. \item You can add passenger mutations. \item You can allow for deviations from the OT and CBN models, specifying a penalty for such deviations (the $s_h$ parameter). \item Right now, three different models are available, two that lead to exponential growth, one of them loosely based on Bozic et al.\ \cite{Bozic2010}, and another that leads to logistic-like growth, based on McFarland et al.\ \cite{McFarland2013}. \item Simulations are generally very fast as I use the BNB algorithm implemented in C++. \end{itemize} Further details about the motivation for wanting to simulate data this way can be found in \cite{ot-biorxiv}, where additional comments about model parameters and caveats are discussed. The Java program by \cite{Reiter2013a} offers somewhat similar functionality, but they are restricted to at most four drivers, you cannot use arbitrary CBNs or OTs to specify order restrictions, there is no allowance for passengers, and a single type of model (a discrete time Galton-Watson process) is implemented. Using this package will often involve the following steps: \begin{enumerate} \item Specify the restrictions in the order of mutations: section \ref{poset}. \item Simulate cancer progression: section \ref{simul}. You can simulate for a single subject or for a set of subjects. You will need to \begin{itemize} \item Decide on a model (e.g., Bozic or McFarland). \item Specify the parameters of the model. \end{itemize} Of course, at least for initial playing around, you can use the defaults. \item Sample from the simulated data: section \ref{sample}, and do something with those simulated data (e.g., fit an OT model to them). What you do with the data, however, is outside the scope of this package. \end{enumerate} Before anything else, let us load the package. We also explicitly load \Biocpkg{graph} for the vignette to work (you do not need that for your usual interactive work). <<>>= library(OncoSimulR) library(graph) @ \section{Specifying restrictions: posets}\label{poset} How to specify the restrictions is shown in the help for \Rfunction{poset}. It is often useful, to make sure you did not make any mistakes, to plot the poset. This is from the examples: <>= ## Node 2 and 3 depend on 1, and 4 depends on no one p1 <- cbind(c(1, 1, 0), c(2, 3, 4)) plotPoset(p1, addroot = TRUE) @ <>= ## A simple way to create a poset where no gene (in a set of 15) depends ## on any other. p4 <- cbind(0, 15) plotPoset(p4, addroot = TRUE) @ Specifying posets is actually straightforward. For instance, we can specify the pancreatic cancer poset in Gerstung et al.\ \cite{Gerstung2011} (their figure 2B, left). We specify the poset using numbers, but for nicer plotting we will use names (KRAS is 1, SMAD4 is 2, etc). This example is also in the help for \Rfunction{poset}: <>= pancreaticCancerPoset <- cbind(c(1, 1, 1, 1, 2, 3, 4, 4, 5), c(2, 3, 4, 5, 6, 6, 6, 7, 7)) plotPoset(pancreaticCancerPoset, names = c("KRAS", "SMAD4", "CDNK2A", "TP53", "MLL3","PXDN", "TGFBR2")) @ \section{Simulating cancer progression}\label{simul} We can simulate the progression in a single subject. Using an example very similar to the one in the help: <>= options(width=60) @ <<>>= ## use poset p1101 data(examplePosets) p1101 <- examplePosets[["p1101"]] ## Bozic Model b1 <- oncoSimulIndiv(p1101, keepEvery = 15) summary(b1) @ The first thing we do is make it simpler (for future examples) to use a set of restrictions. In this case, those encoded in poset p1101. Then, we run the simulations and look at a simple summary and a plot. %% We explicitly %% set \texttt{silent = TRUE} to prevent the vignette from filling up with %% intermediate output. If you want to plot the trajectories, it is better to keep more frequent samples, so you can see when clones appear: <>= b2 <- oncoSimulIndiv(p1101, keepEvery = 1) summary(b2) plot(b2) @ The following is an example where we do not care about passengers, but we want to use a different graph, and we want a few more drivers before considering cancer has been reached. And we allow it to run for longer. Note that in the McF model \texttt{detectionSize} really plays no role. Note also how we pass the poset: it is the same as before, but now we directly access the poset in the list of posets. <<>>= m2 <- oncoSimulIndiv(examplePosets[["p1101"]], model = "McFL", numPassengers = 0, detectionDrivers = 10, mu = 5e-7, initSize = 4000, sampleEvery = 0.025, finalTime = 25000, keepEvery = 5, detectionSize = 1e6) plot(m2, addtot = TRUE, log = "") @ The default is to simulate progression until a simulation reaches cancer (i.e., only simulations that satisfy the detectionDrivers or the detectionSize will be returned). If you use the McF model with large enough \texttt{initSize} this will often be the case but not if you use very small \texttt{initSize}. Likewise, most of the Bozic runs do not reach cancer. Lets try a few: <<>>= b3 <- oncoSimulIndiv(p1101, onlyCancer = FALSE) summary(b3) b4 <- oncoSimulIndiv(p1101, onlyCancer = FALSE) summary(b4) @ Plot those runs: <>= par(mfrow = c(1, 2)) par(cex = 0.8) ## smaller font plot(b3) plot(b4) @ \subsection{Simulating progression in several subjects} To simulate the progression in a bunch of subjects (we will use only four, so as not to fill the vignette with plots) we can do, with the same settings as above: <<>>= p1 <- oncoSimulPop(4, p1101) par(mfrow = c(2, 2)) plot(p1) @ \section{Sampling from a set of simulated subjects}\label{sample} \label{sec:sampling-from-set} You will often want to do something with the simulated data. For instance, sample the simulated data. Here we will obtain the trajectories for 100 subjects in a scenario without passengers. Then we will sample with the default options and store that as a vector of genotypes (or a matrix of subjects by genes): <<>>= m1 <- oncoSimulPop(100, examplePosets[["p1101"]], numPassengers = 0) @ The function \Rfunction{samplePop} samples that object, and also gives you some information about the output: <<>>= genotypes <- samplePop(m1) @ What can you do with it? That is up to you. As an example, let us try to infer an oncogenetic tree (and plot it) using the \CRANpkg{Oncotree} package \cite{Oncotree} after getting a quick look at the marginal frequencies of events: <>= colSums(genotypes)/nrow(genotypes) require(Oncotree) ot1 <- oncotree.fit(genotypes) plot(ot1) @ Your run will likely differ from mine, but with the defaults (detection size of $10^8$) it is likely that events down the tree will never appear. You can set \texttt{detectionSize = 1e9} and you will see that events down the tree are now found in the cross-sectional sample. Alternatively, you can use single cell sampling and that, sometimes, recovers one or a couple more events. <>= genotypesSC <- samplePop(m1, typeSample = "single") colSums(genotypesSC)/nrow(genotypesSC) ot2 <- oncotree.fit(genotypesSC) plot(ot2) @ You can of course rename the columns of the output matrix to something else if you want so the names of the nodes will reflect those potentially more meaningful names. \section{Session info and packages used} This is the information about the version of R and packages used: <<>>= sessionInfo() @ %\newpage %%\bibliographystyle{apalike} %% or agsm or natbib? or apalike; maybe agsm %% does the URL without turning into note? %\bibliographystyle{apalike} %% or agsm or natbib? or apalike; maybe agsm \bibliography{OncoSimulR} \end{document} %% remember to use bibexport to keep just the minimal bib needed %% bibexport -o extracted.bib OncoSimulR.aux %% rm OncoSimulR.bib %% mv extracted.bib OncoSimulR.bib %% and then turn URL of packages into notes %%% Local Variables: %%% TeX-master: t %%% ispell-local-dictionary: "en_US" %%% coding: iso-8859-15 %%% End: