---
title: "Shearwater ML"
author: "Inigo Martincorena and Moritz Gerstung"
date: "6 March 2015"
output: html_document
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Shearwater ML}
%\VignetteDepends{deepSNV}
-->

# Shearwater ML
#### Inigo Martincorena and Moritz Gerstung
6 March 2015

ShearwaterML is a maximum-likelihood adaptation of the original Shearwater algorithm. Unlike the original algorithm, ShearwaterML does not use prior information 
and yields p-values, instead of Bayes factors, using a Likelihood-Ratio Test. This allows using standard multiple testing correction 
methods to obtain a list of significant variants with a controlled false discovery rate.

For a detailed description of the algorithm see:

Martincorena I, Roshan A, Gerstung M, et al. (2015). High burden and pervasive positive selection of somatic mutations in normal human skin. _Science_ (2015).

Load data from `deepSNV` example
```{r}
library(deepSNV)
regions <- GRanges("B.FR.83.HXB2_LAI_IIIB_BRU_K034", IRanges(start = 3120, end=3140))
files <- c(system.file("extdata", "test.bam", package="deepSNV"), system.file("extdata", "control.bam", package="deepSNV"))
counts <- loadAllData(files, regions, q=30)
```

ShearwaterML: "betabinLRT" calculates p-values for each possible mutation
```{r}
pvals <- betabinLRT(counts, rho=1e-4, maxtruncate = 1)$pvals
qvals <- p.adjust(pvals, method="BH")
dim(qvals) = dim(pvals)
vcfML = qvals2Vcf(qvals, counts, regions, samples = files, mvcf = TRUE)
```

Original Shearwater: "bbb" computes the original Bayes factors

```{r}
bf <- bbb(counts, model = "OR", rho=1e-4)
vcfBF <- bf2Vcf(bf, counts, regions, samples = files, prior = 0.5, mvcf = TRUE)

plot(pvals[1,,], bf[1,,]/(1+bf[1,,]), log="xy")
```