---
title: "derfinder advanced details and usage"
author: "L Collado-Torres"
date: "`r doc_date()`"
package: "`r pkg_ver('derfinder')`"
output: 
  BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{derfinder advanced details and usage}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---

`r Biocpkg('derfinder')` advanced details and usage
======================================

```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE}
## Track time spent on making the vignette
startTime <- Sys.time()

## Bib setup
library('knitcitations')

## Load knitcitations with a clean bibliography
cleanbib()
cite_options(hyperlink = 'to.doc', citation_format = 'text', style = 'html')
# Note links won't show for now due to the following issue
# https://github.com/cboettig/knitcitations/issues/63

## Write bibliography information
bibs <- c(knitcitations = citation('knitcitations'),
    derfinder = citation('derfinder')[1], 
    BiocStyle = citation('BiocStyle'), 
    knitr = citation('knitr')[3],
    rmarkdown = citation('rmarkdown'),
    brainspan = RefManageR::BibEntry(bibtype = 'Unpublished', key = 'brainspan', title = 'Atlas of the Developing Human Brain [Internet]. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01.', author = 'BrainSpan', year = 2011, url = 'http://developinghumanbrain.org'),
    originalder = citation('derfinder')[2],
    R = citation(),
    IRanges = citation('IRanges'),
    devtools = citation('devtools'),
    testthat = citation('testthat'),
    GenomeInfoDb = citation('GenomeInfoDb'),
    GenomicRanges = citation('GenomicRanges'),
    ggplot2 = citation('ggplot2'),
    biovizBase = citation('biovizBase'),
    bumphunter = citation('bumphunter')[1],
    TxDb.Hsapiens.UCSC.hg19.knownGene = citation('TxDb.Hsapiens.UCSC.hg19.knownGene'),
    AnnotationDbi = citation('AnnotationDbi'),
    BiocParallel = citation('BiocParallel'),
    derfinderHelper = citation('derfinderHelper')[1],
    GenomicAlignments = citation('GenomicAlignments'),
    GenomicFeatures = citation('GenomicFeatures'),
    GenomicFiles = citation('GenomicFiles'),
    Hmisc = citation('Hmisc'),
    qvalue = citation('qvalue'),
    Rsamtools = citation('Rsamtools'),
    rtracklayer = citation('rtracklayer'),
    S4Vectors = citation('S4Vectors'),
    bumphunterPaper = RefManageR::BibEntry(bibtype = 'article', key = 'bumphunterPaper', title = 'Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies', author = 'Jaffe, Andrew E and Murakami, Peter and Lee, Hwajin and Leek, Jeffrey T and Fallin, M Daniele and Feinberg, Andrew P and Irizarry, Rafael A', year = 2012, journal = 'International Journal of Epidemiology')
)

write.bibtex(bibs,
    file = 'derfinderAdvRef.bib')
bib <- read.bibtex('derfinderAdvRef.bib')

## Assign short names
names(bib) <- names(bibs)
```


# Overview

This vignette explains in more detail the relationship between the different functions in `r Biocpkg('derfinder')` `r citep(bib[['derfinder']])` as well as add-on packages `r Biocpkg('derfinderPlot')` ([vignette](http://lcolladotor.github.io/derfinderPlot/)), `r Biocpkg('derfinderHelper')` ([vignette](http://lcolladotor.github.io/derfinderHelper/)) and `r Biocpkg('regionReport')` ([vignette](http://lcolladotor.github.io/regionReport/)). The vignette also includes some bash scripts for running `r Biocpkg('derfinder')` in a cluster, although there are probably other ways to do so using only R scripts. For example, by using `BatchJobsParam()` from `r Biocpkg('BiocParallel')`.

This vignette assumes that you have read through the introductory vignette.

Lets start by loading the `r Biocpkg('derfinder')` package.

```{r 'start', message=FALSE}
## Load libraries
library('derfinder')
```

If you explore the package, you will notice that all exported functions use camel case name and have aliases with underscore names. For example, `analyzeChr()` and `analyze_chr()`. You should use whichever naming style you prefer. Regrettably, implementing the same flexibility at the argument naming level is not straightforward.

# Advanced arguments

Just like it is shown in the introductory vignette, you can find more information about the advanced arguments using `advancedArg()`.

```{r 'advancedArg'}
## URLs to advanced arguemtns
sapply(c('analyzeChr', 'loadCoverage'), advancedArg, browse = FALSE)
## Set browse = TRUE if you want to open them in your browser
```

These arguments are options we expect not all users will want to change and which would otherwise make the help pages confusing to them. However, all arguments are documented in [roxygen2](http://cran.r-project.org/web/packages/roxygen2/index.html) style in the source code. 

Furthermore, note that using the `...` argument allows you to specify some of the documented arguments. For example, you might want to control the `maxClusterGap` from `findRegions()` in the `analyzeChr()` call.

# Non-human data

If you are working with data from an organism that is not _Homo sapiens_, then set the global options defining the `species` and the `chrsStyle` used. For example, if you are working with _Arabidopsis Thaliana_ and the _NCBI_ naming style, then set the options using the following code:

```{r 'exampleNameStyle', eval = FALSE}
## Set global species and chrsStyle options
options(species = 'arabidopsis_thaliana')
options(chrsStyle = 'NCBI')

## Then proceed to load and analyze the data
```

Internally `r Biocpkg('derfinder')` uses `extendedMapSeqlevels()` to use the appropriate chromosome naming style given a species in all functions involving chromosome names.

Further note that the argument `subject` from `analyzeChr()` is passed to `bumphunter::annotateNearest(subject)`. So if you are using a genome different from __hg19__ remember to provide the appropriate annotation data or simply use `analyzeChr(runAnnotation = FALSE)`. You might find the discussion [Using bumphunter with non-human genomes](https://support.bioconductor.org/p/62781/) useful.

# Advanced loading data


## Controlling loading from BAM files

If you are loading data from BAM files, you might want to specify some criteria to decide which reads to include or not. For example, your data might have been generated by a strand-specific protocol. You can do so by specifying the arguments of `scanBamFlag()` from `r Biocpkg('Rsamtools')`.

You can also control whether to include or exclude bases with `CIGAR` string `D` (deletion from the reference) by setting the advanced argument `drop.D = TRUE` in your `fullCoverage()` or `loadCoverage()` call.

## Unfiltered base-level coverage

Note that in most scenarios, the `fullCov` object illustrated in the introductory vignette can be large in memory. When making plots or calculating the region level coverage, we don't need the full information. In such situations, it might pay off to create a smaller version by loading only the required data. This can be achieved using the advanced argument `which` to `fullCoverage()` or `loadCoverage()`. 

However, it is important to consider that when reading the data from BAM files, a read might align partially inside the region of interest. By default such a read would be discarded and thus the base-level coverage would be lower than what it is in reality. The advanced argument `protectWhich` extends regions by 30 kbp (15 kbp each side) to help mitigate this issue. 

We can illustrate this issue with the example data from `r Biocpkg('derfinder')`. First, we load in the data and generate some regions of interest.

```{r 'runExample', bootstrap.show.output=FALSE}
## Find some regions to work with
example('loadCoverage', 'derfinder')
example('getRegionCoverage', 'derfinder')
```

Next, we load the coverage again using `which` but without any padding. We can see how the coverage is not the same by looking at the maximum coverage for each sample.

```{r 'loadWhich'}
## Illustrate reading data from a set of regions
test <- loadCoverage(files = files, chr = '21', cutoff = NULL, which = regions, protectWhich = 0, fileStyle = 'NCBI')

## Some reads were ignored and thus the coverage is lower as can be seen below:
sapply(test$coverage, max) - sapply(genomeDataRaw$coverage, max)
```

When we re-load the data using some padding to the regions, we find that the coverage matches at all the bases.

```{r 'loadWhich2'}
## Illustrate reading data from a set of regions

test2 <- loadCoverage(files = files, chr = '21', cutoff = NULL, which = 
regions, protectWhich = 3e4, fileStyle = 'NCBI')

## Adding some padding to the regions helps get the same coverage
identical(sapply(test2$coverage, max), sapply(genomeDataRaw$coverage, max))

## A more detailed test reveals that the coverage matches at every base
all(mapply(function(x, y) { identical(x, y) }, test2$coverage, genomeDataRaw$coverage))
```

How much padding you need to use will depend on your specific data set, and you might be comfortable getting approximately the same coverage values for the sake of greatly reducing the memory resources needed.

## Input files in a different naming style

If you are under the case where you like to use a specific chromosome naming style but the raw data files use another one, you might need to use the `fileStyle` argument. 

For example, you could be working with _Homo sapiens_ data and your preferred naming style is _UCSC_ (chr1, chr2, ..., chrX, chrY) but the raw data uses _NCBI_ style names (1, 2, ..., X, Y). In that case, use `fullCoverage(fileStyle = 'NCBI')` or `loadCoverage(fileStyle = 'NCBI')` depending if you are loading one chromosome or multiple at a time.

## Loading data in chunks

If you prefer to do so, `fullCoverage()` and `loadCoverage()` can load the data of a chromosome in chunks using [GenomicFiles](http://www.bioconductor.org/packages/release/bioc/html/GenomicFiles.html). This is controlled by whether you specify the `tilewidth` argument.

Notice that you might run into slight coverage errors near the borders of the tiles for the same reason that was illustrated previously when loading specific regions.

This approach is not necessarily more efficient and can be significantly time consuming if you use a small `tilewidth`.

# Flow charts

## DER analysis flow chart

The following figure illustrates how most of `r Biocpkg('derfinder')`'s functions interact when performing a base-level differential expression analysis by calculating base-level F-statistics.

<center><img src="http://lcolladotor.github.io/derfinder/fig/DERpathway.png" alt="DER flow" style="width: 500px;"/></center>

Flow chart of the different processing steps (black boxes) that can be carried out using `r Biocpkg('derfinder')` and the functions that perform these actions (in red). Input and output is shown in green boxes. Functions in blue are those applied to the results from multiple chromosomes (`mergeResults()` and `derfinderReport`). `r Biocpkg('regionReport')` functions are shown in orange while `r Biocpkg('derfinderPlot')` functions are shown in dark purple. Purple dotted arrow marks functions that require unfiltered base-level coverage.


## `analyzeChr()` flow chart


<center><img src="http://lcolladotor.github.io/derfinder/fig/analyzeChr.png" alt="analyzeChr flow" style="width: 500px;"/></center>

This figure shows in more detail the processing flow in `analyzeChr()`, which is the main function for identifying candidate differentially expressed regions (DERs) from the base-level F-statistics.

Many fine-tunning arguments can be passed to `analyzeChr()` to feed into the other functions. For example, you might want to use a smaller `chunksize` when pre-processing the coverage data (the default is 5 million): specially if you have hundreds of samples. 

Another useful argument is `scalefac` (by default it's 32) which controls the scaling factor to use before the log2 transformation. 

Furthermore, you might want to specify `maxClusterGap` to control the maximum gap between two regions before they are considered to be part of the same cluster.


## `regionMatrix()` flow chart

<center><img src="http://lcolladotor.github.io/derfinder/fig/regionMatrix.png" alt="regionMatrix flow" style="width: 500px;"/></center>

The above figure shows the functions used internally by `regionMatrix()` and processing steps. Overall, it is much simpler than `analyzeChr()`.


# Functions that use multiple cores

Currently, the following functions can use multiple cores, several of which are called inside `analyzeChr()`.

* `calculatePvalues()`: 1 core per chunk of data to process.
* `calculateStats()`: 1 core per chunk of data to process.
* `coerceGR()`: 1 core per chromosome. This function is used by `createBw()`.
* `coverageToExon()`: 1 core per strand, then 1 core per chromosome.
* `loadCoverage()`: up to 1 core per tile when loading the data with [GenomicFiles](http://www.bioconductor.org/packages/release/bioc/html/GenomicFiles.html). Otherwise, no parallelization is used.
* `fullCoverage()`: 1 core per chromosome. In general, try to avoid using more than 10 cores as you might reach your maximum network speed and/or hard disk input/output seed. For the case described in `loadCoverage()`, you can specify how many cores to use per chromosome for the tiles using the `mc.cores.load` argument effectively resulting in `mc.cores` times `mc.cores.load` used (otherwise it's `mc.cores` squared).
* `getRegionCoverage()`: 1 core per chromosome.
* `regionMatrix()`: 1 core per chromosome.

All parallel operations use `SnowParam()` from `r Biocpkg('BiocParallel')` when more than 1 core is being used. Otherwise, `SerialParam()` is used. Note that if you prefer to specify other types of parallelization you can do so by specifying the `BPPARAM.custom` advanced argument.

Because `SnowParam()` requires `R` to load the necessary packages on each worker, the key function `fstats.apply()` was isolated in the `r Biocpkg('derfinderHelper')` package. This package has much faster loading speeds than `r Biocpkg('derfinder')` which greatly impacts performance on cases where the actual step of calculating the F-statistics is fast.

You may prefer to use `MulticoreParam()` described in the `r Biocpkg('BiocParallel')`  vignette. In that case, when using these functions use `BPPARAM.custom = MulticoreParam(workers = x)` where `x` is the number of cores you want to use. Note that in some systems, as is the case of the cluster used by `r Biocpkg('derfinder')`'s developers, the system tools for assessing memory usage can be misleading, thus resulting in much higher memory loads when using `MulticoreParam()` instead of the default `SnowParam()`.


# Project organization

For each project, we recommend the following organization.

```
ProjectDir
|-derCoverageInfo
|-derAnalysis
|---analysis01
|---analysis02
```

That is, a main project directory with two initial directories. One for storing the coverage data, and one for storing each analysis: you might explore different models, cutoffs or other parameters.


You can then use `fullCoverage()`, save the result and also save the filtered coverage information for each chromosome separately. Doing so will result in the following structure.

```
ProjectDir
|-derCoverageInfo
|---Chr1Cov.Rdata
|---Chr2Cov.Rdata
...
|---ChrYCov.Rdata
|---fullCov.Rdata
|-derAnalysis
|---analysis01
|---analysis02
```

Next, you can use `analyzeChr()` for each of the chromosomes of a specific analysis (say _analysis01_). Doing so will create several Rdata files per chromosome as shown below. `bash` scripts can be useful if you wish to submit one cluster job per chromosome. In general, you will use the same model and group information for each chromosome, so saving the information can be useful.

```
ProjectDir
|-derCoverageInfo
|---Chr1Cov.Rdata
|---Chr2Cov.Rdata
...
|---ChrYCov.Rdata
|---fullCov.Rdata
|-derAnalysis
|---analysis01
|-----models.Rdata
|-----groupInfo.Rdata
|-----chr1/
|-------chunksDir/
|-------logs/
|-------annotation.Rdata
|-------coveragePrep.Rdata
|-------fstats.Rdata
|-------optionsStats.Rdata
|-------regions.Rdata
|-------timeinfo.Rdata
|-----chr2/
...
|-----chrY/
|---analysis02
```

Then use `mergeResults()` to pool together the results from all the chromosomes for a given analysis (here _analysis01_). 

```
ProjectDir
|-derCoverageInfo
|---Chr1Cov.Rdata
|---Chr2Cov.Rdata
...
|---ChrYCov.Rdata
|---fullCov.Rdata
|-derAnalysis
|---analysis01
|-----logs/
|-----fullAnnotatedRegions.Rdata
|-----fullFstats.Rdata
|-----fullNullSummary.Rdata
|-----fullRegions.Rdata
|-----fullTime.Rdata
|-----optionsMerge.Rdata
|-----chr1/
|-------chunksDir/
|-------logs/
|-------annotation.Rdata
|-------coveragePrep.Rdata
|-------fstats.Rdata
|-------optionsStats.Rdata
|-------regions.Rdata
|-------timeinfo.Rdata
|-----chr2/
...
|-----chrY/
|---analysis02
```

Finally, you might want to use `derfinderReport()` from `r Biocpkg('regionReport')` to create a HTML report of the results.

# bash scripts

For interacting between `bash` and `R` we have found quite useful the [getopt](http://cran.r-project.org/web/packages/getopt/index.html) package. Here we include an example `R` script that is controlled by a `bash` script which submits a job for each chromosome to analyze for a given analysis.


The two files, _derfinderAnalysis.R_ and _derAnalysis.sh_ should live under the _derAnalysis_ directory.

```
ProjectDir
|-derCoverageInfo
|---Chr1Cov.Rdata
|---Chr2Cov.Rdata
...
|---ChrYCov.Rdata
|---fullCov.Rdata
|-derAnalysis
|---derfinder-Analysis.R
|---derAnalysis.sh
|---analysis01
|---analysis02
```

Then, you can simply use:

```bash
cd /ProjectDir/derAnalysis
sh derAnalysis.sh analysis01
```

to run `analyzeChr()` on all your chromosomes.

## derfinder-analysis.R

```{r 'derfinder-analysis', eval = FALSE}
## Run derfinder's analysis steps with timing info

## Load libraries
library("getopt")

## Available at https://github.com/lcolladotor/derfinder
library("derfinder")

## Specify parameters
spec <- matrix(c(
	'DFfile', 'd', 1, "character", "path to the .Rdata file with the results from loadCoverage()",
	'chr', 'c', 1, "character", "Chromosome under analysis. Use X instead of chrX.",
	'mcores', 'm', 1, "integer", "Number of cores",
	'verbose' , 'v', 2, "logical", "Print status updates",
	'help' , 'h', 0, "logical", "Display help"
), byrow=TRUE, ncol=5)
opt <- getopt(spec)

## Testing the script
test <- FALSE
if(test) {
	## Speficy it using an interactive R session and testing
	test <- TRUE
}

## Test values
if(test){
	opt <- NULL
	opt$DFfile <- "/ProjectDir/derCoverageInfo/chr21Cov.Rdata"
	opt$chr <- "21"
	opt$mcores <- 1
	opt$verbose <- NULL
}

## if help was asked for print a friendly message
## and exit with a non-zero error code
if (!is.null(opt$help)) {
	cat(getopt(spec, usage=TRUE))
	q(status=1)
}

## Default value for verbose = TRUE
if (is.null(opt$verbose)) opt$verbose <- TRUE

if(opt$verbose) message("Loading Rdata file with the output from loadCoverage()")
load(opt$DFfile)

## Make it easy to use the name later. Here I'm assuming the names were generated using output='auto' in loadCoverage()
eval(parse(text=paste0("data <- ", "chr", opt$chr, "CovInfo")))
eval(parse(text=paste0("rm(chr", opt$chr, "CovInfo)")))

## Just for testing purposes
if(test) {
	tmp <- data
	tmp$coverage <- tmp$coverage[1:1e6, ]
	library("IRanges")
	tmp$position[which(tmp$pos)[1e6 + 1]:length(tmp$pos)] <- FALSE
	data <- tmp
}

## Load the models
load("models.Rdata")

## Load group information
load("groupInfo.Rdata")


## Run the analysis with lowMemDir
analyzeChr(chr=opt$chr, coverageInfo=data, models=models, cutoffFstat=1e-06, cutoffType="theoretical", nPermute=1000, seeds=seq_len(1000), maxClusterGap=3000, groupInfo=groupInfo, subject="hg19", mc.cores=opt$mcores, lowMemDir=file.path(tempdir(), paste0("chr", opt$chr) , "chunksDir")), verbose=opt$verbose, chunksize=1e5)

## Done
if(opt$verbose) {
	print(proc.time())
	print(sessionInfo(), locale=FALSE)
}

```

Remember to modify the the script to fit your project.

## derAnalysis.sh

```bash
#!/bin/sh

## Usage
# sh derAnalysis.sh analysis01

# Directories
MAINDIR=/ProjectDir
WDIR=${MAINDIR}/derAnalysis
DATADIR=${MAINDIR}/derCoverageInfo

# Define variables
SHORT='derA-01'
PREFIX=$1

# Construct shell files
for chrnum in 22 21 Y 20 19 18 17 16 15 14 13 12 11 10 9 8 X 7 6 5 4 3 2 1
do
	echo "Creating script for chromosome ${chrnum}"
    
    if [[ ${chrnum} == "Y" ]]
    then
    	CORES=2
    else
    	CORES=6
    fi
    
	chr="chr${chrnum}"
	outdir="${PREFIX}/${chr}"
	sname="${SHORT}.${PREFIX}.${chr}"
	cat > ${WDIR}/.${sname}.sh <<EOF
#!/bin/bash
#$ -cwd
#$ -m e
#$ -l mem_free=120G,h_vmem=60G,h_fsize=10G
#$ -N ${sname}
#$ -pe local ${CORES}

echo "**** Job starts ****"
date

# Create output directory 
mkdir -p ${WDIR}/${outdir}
# Make logs directory
mkdir -p ${WDIR}/${outdir}/logs

# run derfinder-analysis.R
cd ${WDIR}/${PREFIX}/

# specific to our cluster
# see http://www.jhpce.jhu.edu/knowledge-base/environment-modules/
module load R/3.1.x
Rscript ${WDIR}/derfinder2-analysis.R -d "${DATADIR}/${chr}Cov.Rdata" -c "${chrnum}" -m ${CORES} -v TRUE

# Move log files into the logs directory
mv ${WDIR}/${sname}.* ${WDIR}/${outdir}/logs/

echo "**** Job ends ****"
date
EOF
	call="qsub .${sname}.sh"
	$call
done

```


Your cluster might specify memory requirements differently and you might need to use fewer or more cores depending on your data set.


# Summary

This vignette covered the most commonly used advanced arguments, details on how to load data, flow charts explaining the relationships between the functions, the recommended output organization, and example shell scripts for running the analysis.


# Origins

This implementation of `r Biocpkg('derfinder')` has its origins in [Alyssa C. Frazee's derfinder](https://github.com/alyssafrazee/derfinder) `r citep(bib[['originalder']])`. The statistical methods and implementation by now are very different.


# Citing `r Biocpkg('derfinder')`

Please use:

```{r 'citation'}
## Citation info
citation('derfinder')
```


# Reproducibility

This package was made possible thanks to:

* R `r citep(bib[['R']])`
* `r Biocpkg('AnnotationDbi')` `r citep(bib[['AnnotationDbi']])`
* `r Biocpkg('BiocParallel')` `r citep(bib[['BiocParallel']])`
* `r Biocpkg('bumphunter')` `r citep(bib[['bumphunter']])` and `r citep(bib[['bumphunterPaper']])`
* `r Biocpkg('derfinderHelper')` `r citep(bib[['derfinderHelper']])`
* `r Biocpkg('GenomeInfoDb')` `r citep(bib[['GenomeInfoDb']])`
* `r Biocpkg('GenomicAlignments')` `r citep(bib[['GenomicAlignments']])`
* `r Biocpkg('GenomicFeatures')` `r citep(bib[['GenomicFeatures']])`
* `r Biocpkg('GenomicFiles')` `r citep(bib[['GenomicFiles']])`
* `r Biocpkg('GenomicRanges')` `r citep(bib[['GenomicRanges']])`
* `r CRANpkg('Hmisc')` `r citep(bib[['Hmisc']])`
* `r Biocpkg('IRanges')` `r citep(bib[['IRanges']])`
* `r Biocpkg('qvalue')` `r citep(bib[['qvalue']])`
* `r Biocpkg('Rsamtools')` `r citep(bib[['Rsamtools']])`
* `r Biocpkg('rtracklayer')` `r citep(bib[['rtracklayer']])`
* `r Biocpkg('S4Vectors')` `r citep(bib[['S4Vectors']])`
* `r Biocpkg('biovizBase')` `r citep(bib[['biovizBase']])`
* `r Biocexptpkg('derfinderData')` `r citep(bib[['derfinderData']])`
* `r CRANpkg('devtools')` `r citep(bib[['devtools']])`
* `r CRANpkg('ggplot2')` `r citep(bib[['ggplot2']])`
* `r CRANpkg('knitcitations')` `r citep(bib[['knitcitations']])`
* `r CRANpkg('knitr')` `r citep(bib[['knitr']])`
* `r Biocpkg('BiocStyle')` `r citep(bib[['BiocStyle']])`
* `r CRANpkg('rmarkdown')` `r citep(bib[['rmarkdown']])`
* `r CRANpkg('testthat')` `r citep(bib[['testthat']])`
* `r Biocannopkg('TxDb.Hsapiens.UCSC.hg19.knownGene')` `r citep(bib[['TxDb.Hsapiens.UCSC.hg19.knownGene']])`

Code for creating the vignette

```{r createVignette, eval=FALSE}
## Create the vignette
library('rmarkdown')
system.time(render('derfinderAdvanced.Rmd', 'BiocStyle::html_document'))

## Extract the R code
library('knitr')
knit('derfinderAdvanced.Rmd', tangle = TRUE)
```

```{r createVignette2}
## Clean up
file.remove('derfinderAdvRef.bib')
```

Date the vignette was generated.

```{r reproducibility1, echo=FALSE}
## Date the vignette was generated
Sys.time()
```

Wallclock time spent generating the vignette.

```{r reproducibility2, echo=FALSE}
## Processing time in seconds
totalTime <- diff(c(startTime, Sys.time()))
round(totalTime, digits=3)
```

`R` session information.

```{r reproducibility3, echo=FALSE}
## Session info
library('devtools')
options(width = 120)
session_info()
```

# Bibliography

This vignette was generated using `r Biocpkg('BiocStyle')` `r citep(bib[['BiocStyle']])`
with `r CRANpkg('knitr')` `r citep(bib[['knitr']])` and `r CRANpkg('rmarkdown')` `r citep(bib[['rmarkdown']])` running behind the scenes.

Citations made with `r CRANpkg('knitcitations')` `r citep(bib[['knitcitations']])`.

```{r vignetteBiblio, results = 'asis', echo = FALSE, warning = FALSE}
## Print bibliography
bibliography()
```