%\VignetteIndexEntry{ensemblVEP}
%\VignetteDepends{GenomicRanges, VariantAnnotation, Biostrings}
%\VignetteKeywords{annotation, variants}
%\VignettePackage{ensemblVEP}
\documentclass[10pt]{article}

\usepackage{times}
\usepackage{hyperref}

\usepackage[margin=0.65in]{geometry}

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textsf{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}

\newcommand{\software}[1]{\textsf{#1}}
\newcommand{\R}{\software{R}}
\newcommand{\Bioconductor}{\software{Bioconductor}}

\SweaveOpts{keep.source=TRUE}

\title{Overview of \Rpackage{ensemblVEP}}
\author{Lori Shepherd and Valerie Obenchain}

\begin{document}

\maketitle
\tableofcontents

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Ensembl provides the facility to predict functional consequences 
of known and unknown variants using the Variant Effect Predictor 
(VEP). The \Rpackage{ensemblVEP} package wraps Ensembl VEP and 
returns the results as \R objects or a file on disk. To use this 
package the Ensembl VEP perl script must be installed in your path.
See the package README for details.

NOTE: As of Ensembl version 88 the VEP script has been renamed from
variant\_effect\_predictor.pl to vep. The ensemblVEP package code
and documentation have been updated to reflect this change.

\noindent Downloads:
\url{http://uswest.ensembl.org/info/docs/tools/vep/index.html}
\\
\noindent Complete documentation for runtime options:
\url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html}
\\
\noindent To test that Ensembl VEP is properly installed, enter the
name of the script from the command line:

{\it vep}
\\


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Results as \R{} objects}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

<<setup>>=
library(ensemblVEP)
@

The \Rfunction{ensemblVEP} function can return variant consequences from
Ensembl VEP as \R objects (\Robject{GRanges} or \Robject{VCF}) or write
them to a file. The default behavior returns a \Robject{GRanges}. Runtime 
options are stored in a \Robject{VEPFlags} object and allow a great deal 
of control over the content and format of the results. See the man pages 
for more details.
<<man_page, eval=FALSE>>=
?ensemblVEP
?VEPFlags
@

The default runtime options can be inspected by creating a 
\Robject{VEPFlags}.
<<default_VEPFlags>>=
param <- VEPFlags()
param
flags(param)
@

Using a vcf file from \Rpackage{VariantAnnotation} as input, we query 
Ensembl VEP with the default runtime parameters. Consequence data are 
parsed into the metadata columns of the \Robject{GRanges}. To control 
the type and amount of data returned see the output options at 
\url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html}.

<<rtn_GRanges, results=verbatim>>=
fl <- system.file("extdata", "gl_chr1.vcf", package="VariantAnnotation")
gr <- ensemblVEP(fl)
head(gr, 3)
@

Next we request that a \Robject{VCF} object be returned by setting
the {\it vcf} option in the {\it flags} slot to TRUE. 
<<set_vcf>>=
param <- VEPFlags(flags=list(vcf=TRUE))
vep <- ensemblVEP(fl, param)
@

Success! When a \Robject{VCF} is returned, consequence data are 
included as an unparsed INFO column labeled {\it CSQ}.
<<rtn_VCF>>=
info(vep)$CSQ
@

The \Rfunction{parseCSQToGRanges} function parses these data
into a \Robject{GRanges}. When the rownames of the original
VCF are provided as \Rcode{VCFRowID} a metadata column of the
same name is included in the output.
<<parseCSQToGRanges>>=
vcf <- readVcf(fl, "hg19")
csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf))
head(csq, 3)
@ 

The \Rcode{VCFRowID} columns maps the expanded {\it CSQ} data back
to the rows in the \Rclass{VCF} object. This index can be used to
subset the original VCF.
<<map_rownames>>=
vcf[csq$"VCFRowID"]
@

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Write results to a file}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

In the previous section we saw Ensembl VEP results returned as 
\R{} objects in the workspace. Alternatively, these results can
be written directly to a file. The flag that controls how the
data are returned is the {\it output\_file} flag.

When {\it output\_file} is NULL (default), the results
are returned as either a \Rclass{GRanges} or \Rclass{VCF} object.
<<output_file_default>>=
flags(param)$output_file
@

To write results directly to a file, specify a file name for the 
{\it output\_file} flag.
<<output_file_filename>>=
flags(param)$output_file <- "/mypath/myfile"
@

The file can be written as a {\it vcf} or {\it gvf} by setting the
options of the slot to TRUE. If neither of {\it vcf}
or {\it gvf} are TRUE the file is written out as tab delimited.
<<ouput_slot>>=
## Write a vcf file to myfile.vcf:
myparam <- VEPFlags(flags=list(vcf=TRUE,
                        output_file="/path/myfile.vcf"))
## Write a gvf file to myfile.gvf:
myparam <- VEPFlags(flags=list(gvf=TRUE,
                        output_file="/path/myfile.gvf"))
## Write a tab delimited file to myfile.txt:
myparam <- VEPFlags(flags=list(output_file="/path/myfile.txt"))
@


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Configuring runtime options}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The Ensembl VEP web page has complete descriptions of all runtime
options.
\url{http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html}
Below are examples of how to configure the runtime options
in the \Rclass{VEPFlags} for specific situations. Investigate the 
differences in results using a sample file from \Rpackage{VariantAnnotation}.
<<samplefile>>=
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
@

\begin{itemize}
  \item Add regulatory region consequences:
<<runtime1, eval=FALSE>>=
param <- VEPFlags(flags=list(regulatory=TRUE))
gr <- ensemblVEP(fl, param) 
@

  \item Specify input file format as VCF, add HGNC gene 
        identifiers, output SO consequence terms:
<<runtime2, eval=FALSE>>=
param <- VEPFlags(flag=list(format="vcf",
                      terms="SO",
                      symbol=TRUE))
gr <- ensemblVEP(fl, param) 
@

  \item Check for co-located variants, output only coding 
        sequence consequences, output HGVS names:
<<runtime3, eval=FALSE>>=
param <- VEPFlags(flags=list(coding_only=TRUE,
                      check_existing=TRUE,
                      symbol=TRUE))
gr <- ensemblVEP(fl, param) 
@

  \item Add SIFT score and prediction, PolyPhen prediction only, 
        output results as \Rcode{VCF}:
\begin{verbatim}
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
param <- VEPFlags(flags=list(sift="b", polyphen="p", 
                  vcf=TRUE))
vcf <- ensemblVEP(fl, param)
csq <- parseCSQToGRanges(vcf)

> head(levels(mcols(csq)$SIFT))
[1] "deleterious(0.01)" "deleterious(0.02)" "deleterious(0.03)"
[4] "deleterious(0.04)" "deleterious(0.05)" "deleterious(0)" 

> levels(mcols(csq)$PolyPhen)
[1] "benign"            "possibly_damaging" "probably_damaging"
[4] "unknown" 
\end{verbatim}
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{\Rcode{sessionInfo()}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

<<sessionInfo>>=
sessionInfo()
@

\end{document}