%\VignetteIndexEntry{Introduction to VariantAnnotation}
%\VignetteKeywords{variants, sequence, sequencing, alignments}
%\VignettePackage{VariantAnnotation}
\documentclass[10pt]{article}

\usepackage{times}
\usepackage{hyperref}

\textwidth=6.5in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=-.1in
\evensidemargin=-.1in
\headheight=-.3in

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}

\newcommand{\software}[1]{\textsf{#1}}
\newcommand{\R}{\software{R}}
\newcommand{\Bioconductor}{\software{Bioconductor}}
\newcommand{\VariantAnnotation}{\Rpackage{VariantAnnotation}}

\title{Introduction to \VariantAnnotation}
\author{Valerie Obenchain}
\date{\today}

\begin{document} 

\maketitle
\tableofcontents

<<options, echo=FALSE>>=
options(width=72)
@

\section{Introduction}
This vignette outlines a general workflow for annotating and filtering 
genetic variants using the \VariantAnnotation  package. Sample data are 
in VariantCall Format (VCF) and are a subset of chromosome 22 from 1000 Genomes,
\url{ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/}.
VCF is a text file format that contains meta-information lines, a header line 
with column names, data lines with information about a position in the genome, 
and optional genotype information on samples for each position. A full 
description of the VCF format can be found on the 1000 Genomes page,
\url{http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41}

Sample data are read in from a VCF file and variants are identified according 
to region such as \Rcode{coding}, \Rcode{intron}, \Rcode{intergenic}, 
\Rcode{spliceSite} etc. Amino acid coding changes are computed for the 
non-synonymous variants and SIFT and PolyPhen databases provide predictions 
of how severly the coding changes affect protein function. The end of the 
vignette covers other transformations of VCF data such as the creation 
of a \Robject{SnpMatrix} or a `long form' \Robject{GRanges}. 

\section{Variant Call Format (VCF) files}
\subsection{Import complete files}
Data are parsed into a \Robject{VCF} object with \Rfunction{readVcf}. 

<<readVcf>>=
library(VariantAnnotation)
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
vcf
@

Extract the header information stored in the \Rcode{exptData} slot
<<readVcf_showheader>>=
hdr <- exptData(vcf)[["header"]]
hdr
@

and explore it with the \Rcode{fixed}, \Rcode{info} and \Rcode{geno}
accessors. More information on this object can be found at
?\Rcode{VCFHeader}.
<<headeraccessors>>=
fixed(hdr)
head(info(hdr), 3)
@

The \Robject{GRanges} in the \Rcode{rowData} slot is created from
information in the the CHROM, POS, and ID fields of the VCF file.
Values in the \Rcode{paramRangeID} column are meaningful when ranges 
have been specified in the the \Rcode{param} argument to \Rfunction{readVcf}.
This is discussed further in the \Rcode{Data Subsets} section.

<<readVcf_rowData>>=
head(rowData(vcf))
@

The REF, ALT, QUAL and FILTER fields can be accessed together with 
\Rcode{fixed} accessor or individually with \Rcode{ref}, \Rcode{alt}, 
\Rcode{qual} and \Rcode{filt} accessors. 

<<readVcf_fixed>>=
head(fixed(vcf), 3)
@

The \Robject{ALT} column is stored as a \Robject{DNAStringSetList} unless
the file is a structural VCF, in which case it is stored as a 
\Robject{CharacterList}. Extract \Robject{ALT} from the \Robject{GRanges} and
determine the number of elements in the list. 
<<readVcf_ALT>>=
alternate <- alt(vcf)
alternate

## number of ALT values per variant
unique(elementLengths(alternate))
head(unlist(alternate))
@

Genotype data described in the \Rcode{FORMAT} field are parsed into 
matrices or arrays and can be accessed with the \Rcode{geno} accessor. 
These data are not returned with the \Robject{GRanges} from \Rcode{rowData}
because they are unique for each sample and the data structures can be 
multidimensional. This is in contrast to the \Rcode{fixed} and \Rcode{info} 
data which are the same for a each variant across all samples.

Extract the header information for the genotypes. 
<<>>=
geno(hdr)
@

Elements of the genotype list can be accessed in the usual way.
<<readVCF_geno>>=
geno(vcf)
geno(vcf)$GT[1:3,1:5]
geno(vcf)$DS[1:3,1:5]
@

\subsection{Import data subsets}
When working with large VCF files it may be more efficient to read in subsets 
of the data. Data can be subset by selecting genomic coordinates (ranges) or by
selecting fields from the VCF file. 

\subsubsection{Genomic coordinates}
Subset by genomic coordinates by creating a \Robject{GRanges}, 
\Robject{RangedData} or \Robject{RangesList}. To read in a portion of
chromosome 22, we create a \Robject{GRanges} with the regions of interest.

<<subset_ranges>>=
rng <- GRanges(seqnames="22", 
               ranges=IRanges(c(50301422, 50989541), c(50312106, 51001328)))
names(rng) <- c("gene_79087", "gene_644186")
@

When ranges are specified, the VCF file must have an accompanying Tabix index 
file; if one does not exist it must be created. See ?\Rcode{indexTabix} for 
help creating an index. 

Once the index exists a \Robject{TabixFile} instance can be created, see 
?\Rcode{TabixFile}. This object creates a reference to the VCF and its index. 
Once opened, the reference remains open across calls to methods, avoiding costly 
index re-loading. An index file for our sample data is included in the package
so the \Rcode{TabixFile} can be created with,

<<subset_TabixFile>>=
tab <- TabixFile(fl)
tab
@

Call \Rfunction{readVcf} with \Robject{TabixFile} and the ranges as the 
\Robject{param}. The dimension of the resulting \Robject{VCF} object shows 
397 records overlaped with the specified ranges. 
<<subset_call>>= 
vcf_rng <- readVcf(tab, "hg19", rng)
vcf_rng
@

The \Rcode{paramRangesID} column now has meaning as it distinguishes which
variant records came from which param range.
<<>>=
head(rowData(vcf_rng), 3)
@

\subsubsection{VCF fields}
In addition to specifying ranges, data can be subset on specific fields 
in the VCF file. Fields available for import are described in the 
header information. To view the header before reading in the data in use 
\Rfunction{ScanVcfHeader}. 

<<subset_scanVcfHeader>>=
hdr <- scanVcfHeader(fl)
hdr
@ 

The \Rcode{info} and \Rcode{geno} accessors return \Robject{DataFrames}
containing descriptions of the fields, data type and number of values. 
A listing of all possible \Rcode{info} or \Rcode{geno} values is 
constructed by selecting the rownames of the \Rcode{DataFrames}.

<<subset_infoformat>>=
## INFO fields
info_DF <- info(hdr) 
rownames(info_DF) 
## FORMAT fields
geno_DF <- geno(hdr) 
rownames(geno_DF) 
@

We are interested in "LDAF" in INFO which is 'allele frequency accounting 
for linkage disequlibrium', and "GT" in FORMAT which is 'genotype'. Full 
descriptions of the elements can be seen in the header INFO and FORMAT 
\Robject{DataFrames}.
<<subset_elements>>=
info_DF[rownames(info_DF) == "LDAF", ]
geno_DF[rownames(geno_DF) == "GT", ]
@

To subset on "LDAF" and "GT" we specify them as \Rcode{character} vectors 
in the \Rcode{info} and \Rcode{geno} arguments to \Rcode{ScanVcfParam}. This
creates a \Robject{ScanVcfParam} object which is used as the \Robject{param}
argument to \Rfunction{readVcf}. 

<<subset_ScanVcfParam>>=
## Return "ALT" from 'fixed', "LAF" from 'info' and "GT" from 'geno'
svp <- ScanVcfParam(fixed="ALT", info="LDAF", geno="GT")

## Return all 'fixed' fields, "LAF" from 'info' and "GT" from 'geno'
svp <- ScanVcfParam(info="LDAF", geno="GT")
svp
@

Note that subsetting by the VCF fields does not affect the number of ranges read 
in. Instead the results of the filtering are reflected in the names of the elements 
returned from the \Rcode{info} and \Rcode{geno} accessors. 

<<subset_ScanVcfParam2>>=
vcf_flds <- readVcf(fl, "hg19", svp)
geno(vcf_flds)
head(info(vcf_flds), 3)
@

In the previous section we saw that a Tabix index file must exist when data are
subset by genomic coordinates (i.e., ranges). This is not the case when 
subsetting on INFO and FORMAT elements. An index file is only needed when 
subsetting by ranges.

\subsubsection{Subset on both genomic coordinates and VCF fields}
To subset on both genomic coordinates and INFO and FORMAT fields the
\Robject{ScanVcfParam} object must contain both. Our previous 
\Robject{ScanVcfParam} did not have ranges associated with it so we create a 
new instance with the ranges and INFO and FORMAT fields. 

<<subset_ScanVcfParam_new>>=
svp_all <- ScanVcfParam(info="LDAF", geno="GT", which=rng) 
svp_all
@

The subsetting here involves genomic coordinates so we need to use the 
Tabix index file we created.

<<subset_both>>=
readVcf(tab, "hg19", svp_all) 
@

\subsection{Adjusting chromosome names}
When functions involve the comparision of ranges by overlaps. For overlap 
methods to work properly the chromosome names (seqlevels) must be
compatible. 

The VCF data chromosome names are represented by number, i.e. '22',
<<seqlevels_rd>>=
rowdat <- rowData(vcf)
seqlevels(rowdat)
@

but the TxDb chromosome names are preceded with 'chr'.
<<seqlevels_TxDb>>=
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene 
head(seqlevels(txdb))
@

Chromosome names can be modified with the \Rfunction{renameSeqlevels}
function. Seqlevels are modified at the \Robject{GRanges} level in the
\Robject{rowData} slot of the \Robject{VCF} which means all future data 
extractions from this \Robject{VCF} will have the new seqlevels. 
If the data are read in from the file again, however, the seqlevels will 
need to be adjusted again. See ?\Rfunction{VCF} and ?\Rfunction{renameSeqlevels} 
for examples with \Robject{VCF} and \Robject{GRanges} objects.

<<seqlevels_rename>>=
## rename variant seqlevels in the VCF object
vcf <- renameSeqlevels(vcf, c("22"="chr22"))

## extract the rowData with modified seqlevels
rd <- rowData(vcf)
 
## confirm seqlevels are the same
intersect(seqlevels(rd), seqlevels(txdb))
@

To subset a \Robject{VCF} or \Robject{GRanges} by chromosome use
\Rfunction{keepSeqlevels}. As an example we extract transcripts for all 
chromosomes in TxDb.Hsapiens.UCSC.hg19.knownGene then keep only 'chr21' and 
'chr22'. See ?\Robject{VCF} and ?\Rfunction{keepSeqlevels} for details. 
\begin{verbatim}
## initially there are 93 chromosomes
> rngs <- transcripts(txdb)
> length(seqlevels(rngs))
[1] 93
## keep only chr21 and chr22 
> rngs <- keepSeqlevels(rngs, c("chr21", "chr22"))
> seqlevels(rngs)
[1] "chr21" "chr22"
\end{verbatim}

\section{Variant location}
\Rfunction{locateVariants} identifies where the ranges in \Rcode{query} fall 
with respect to the annotation supplied in \Robject{subject}. Regions are 
specified in the \Rcode{region} argument and can be one of the following 
constructors: CodingVariants, IntronVariants, FiveUTRVariants, ThreeUTRVariants, 
IntergenicVariants, or SpliceSiteVariants. Location definitions are shown in 
Table \ref{table:location}. 

\begin{table}[h!]
\begin{center}
\begin{tabular}{l|l}
\hline
Location & Details \\
\hline
coding &  falls \emph{within} a coding region \\
fiveUTR &  falls \emph{within} a 5' untranslated region \\
threeUTR &  falls \emph{within} a 3' untranslated region \\
intron &  falls \emph{within} an intron region \\
intergenic & does not fall \emph{within} a transcript associated with a gene \\
spliceSite & overlaps any portion of the first 2 or last 2 
nucleotides of an intron \\ 
promoter & falls \emph{within} a promoter region of a transcript \\
\hline
\end{tabular}
\end{center}
\caption{Variant locations}
\label{table:location}
\end{table}

When the \Rcode{query} is a \Robject{VCF} the variant ranges are taken from the 
\Rcode{rowData} slot. If \Rcode{query} is a \Robject{GRanges} it can have
additional elementMetadata columns but they are ignored. As an alternative to a
\Rcode{TranscriptDb}, the \Robject{subject} can be a \Robject{GRangesList}
of the appropriate type. \Rcode{CodingVariants} would require coding regions
by transcript, for \Rcode{IntronVariants} introns by transcripts would be
necessary, etc. See ?\Rcode{locateVariants} man page for details.

Identify the coding variants, 
<<CodingVariants>>=
loc <- locateVariants(rd, txdb, CodingVariants())
head(loc, 4)
@ 

\Rcode{SpliceSiteVariants} are those overlapping the first 2 or last 2 nucleotides 
of an intron.
<<SpliceSiteVariants>>=
head(locateVariants(rd, txdb, SpliceSiteVariants()), 4)
@

To locate variants in all regions use the \Rcode{AllVariants()} constructor,
<<AllVariants, eval=FALSE>>=
allvar <- locateVariants(rd, txdb, AllVariants())
@

The \Robject{GRanges} output of \Rfunction{locateVariants} includes only the
ranges that fell in the specified region. Each row is a variant-transcript 
match which may result in multiple rows for each variant. 
\Rcode{elementMetadata} columns returned include \Rcode{LOCATION}, 
\Rcode{QUERYID}, \Rcode{TXID}, \Rcode{CDSID}, and \Rcode{GENEID}. In the case 
of \Robject{IntergenicVariants} columns for \Rcode{PRECEDEID} and 
\Rcode{FOLLOWID} are also included. The \Rcode{QUERYID} column maps back to 
the row number in the original \Robject{query}. 

To answer gene-centric questions data can be summarized by gene reguardless
of transcript. 
<<locatVariants_example>>=
## Did any coding variants match more than one gene?
table(sapply(split(values(loc)[["GENEID"]], values(loc)[["QUERYID"]]), 
    function(x) length(unique(x)) > 1))

## Summarize the number of coding variants by gene ID
idx <- sapply(split(values(loc)[["QUERYID"]], values(loc)[["GENEID"]]), unique)
sapply(idx, length)
@

\section{Amino acid coding changes}

\Rfunction{predictCoding} computes amino acid coding changes for non-synonymous 
variants. Only ranges in \Rcode{query} that overlap with a coding region in 
the \Rcode{subject} are considered. Reference sequences are retrieved from 
either a \Robject{BSgenome} or fasta file specified in \Rcode{seqSource}. 
Variant sequences are constructed by substituting, inserting or deleting values 
in the \Robject{varAllele} column into the reference sequence. Amino acid codes 
are computed for the variant codon sequence when the length is a multiple of 3. 
Examples of coding situations are shown in Table \ref{table:aacoding}. 

The \Robject{query} argument to \Rfunction{predictCoding} can be a 
\Robject{GRanges} or \Robject{VCF}. When a \Robject{GRanges} is supplied 
the \Rcode{varAllele} argument must be specified. In the case of a 
\Robject{VCF}, the alternate alleles are taken from 
\Robject{values(alt(<VCF>))[["ALT"]]} and the \Robject{varAllele} argument is
not specified.

\begin{table}[h]
\begin{center}
\begin{tabular}{l|l|l|l|l|c}
\hline
Type & refAllele & varAllele & refCodon & varCodon & translation possible \\
\hline
substitution & G & T & aag & aaT & yes\\
substitution & G & TG & tga & tTGa & no\\
substitution & G & TGCG & gtc & TGCGtc & yes\\
insertion & `' & G & cgg & Gcgg & no\\
insertion & `' & TTG & gaa & gaTTGa & yes\\
deletion & A & `' & atc & tc & no\\
deletion & GGCCTA & `' & acggcctaa & aca & yes\\
\hline
\end{tabular}
\end{center}
\caption{Amino acid coding}
\label{table:aacoding}
\end{table}

The result is a modified \Rcode{query} containing only variants that fall 
within coding regions. Each row represents a variant-transcript match so more 
than one row per original variant is possible.

<<predictCoding>>=
library(BSgenome.Hsapiens.UCSC.hg19)
coding <- predictCoding(vcf, txdb, seqSource=Hsapiens)
coding[5:9]
@

Using variant rs114264124 as an example, we see varAllele \Rcode{A} has been
substituted into the \Rcode{refCodon} \Rcode{CGG} to produce \Rcode{varCodon}
\Rcode{CAG}. The \Rcode{refCodon} is the sequence of codons necessary to make
the variant allele substitution and therefore often includes more nucleotides
than indicated in the range (i.e. the range is 50302962, 50302962, width of 1).
Notice it is the second position in the \Rcode{refCodon} that has been
substituted. This position in the codon, the position of substitution,
corresponds to genomic position 50302962. This genomic position maps to position
698 in coding region-based coordinates and to triplet 233 in the protein. This
is a non-synonymous coding variant where the amino acid has changed from
\Rcode{R} (Arg) to \Rcode{Q} (Gln).

When the resulting \Rcode{varCodon} is not a multiple of 3 it cannot be translated. The
consequence is considered a \Rcode{frameshift} and \Robject{varAA} will be 
missing. 

<<predictCoding_frameshift>>=
## CONSEQUENCE is 'frameshift' where translation is not possible
coding[values(coding)[["CONSEQUENCE"]] == "frameshift"]
@

\section{SIFT and PolyPhen Databases}

From \Rfunction{predictCoding} we identified the amino acid coding changes for 
the non-synonymous variants. For this subset we can retrieve predictions of how 
damaging these coding changes may be. SIFT (Sorting Intolerant From 
Tolerant) and PolyPhen (Polymorphism Phenotyping) are methods that predict the 
impact of amino acid substitution on a human protein. The SIFT method uses 
sequence homology and the physical properties of amino acids to make predictions 
about protein function. PolyPhen uses sequence-based features and structural 
information characterizing the substitution to make predictions about the 
structure and function of the protein. 

Collated predictions for specific dbSNP builds are available as downloads from 
the SIFT and PolyPhen web sites. These results have been packaged into
\Rpackage{SIFT.Hsapiens.dbSNP132.db} and \Rpackage{PolyPhen.Hapiens.dbSNP131.db} 
and are designed to be searched by rsid. Variants that are in dbSNP can be 
searched with these database packages. When working with novel variants, SIFT
and PolyPhen must be called directly. See references for home pages.

Identify the non-synonymous variants and obtain the rsids.
<<nonsynonymous>>=
nms <- names(coding)
idx <- values(coding)[["CONSEQUENCE"]] == "nonsynonymous"
nonsyn <- coding[idx]
names(nonsyn) <- nms[idx]
rsids <- unique(names(nonsyn)[grep("rs", names(nonsyn), fixed=TRUE)])
@

Detailed descriptions of the database columns can be found with
\Rcode{?SIFTDbColumns} and \Rcode{?PolyPhenDbColumns}. Variants in these
databases often contain more than one row per variant. The variant may have
been reported by multiple sources and therefore the source will differ as well
as some of the other variables. 

It is important to keep in mind the pre-computed predictions in the
SIFT and PolyPhen packages are based on specific gene models. SIFT is
based on Ensembl and PolyPhen on UCSC Known Gene. The \Rcode{TranscriptDb}
we used to identify the coding snps was based on UCSC Known Gene so we
will use PolyPhen for predictions. PolyPhen provides predictions using two 
different training datasets and has considerable information about 3D protein 
structure. See \Rcode{?PolyPhenDbColumns} or the PolyPhen web site listed in 
the references for more details.

Query the PolyPhen database,
<<polyphen>>=
library(PolyPhen.Hsapiens.dbSNP131)

pp <- select(PolyPhen.Hsapiens.dbSNP131, keys=rsids,
             cols=c("TRAININGSET", "PREDICTION", "PPH2PROB"))
head(pp[!is.na(pp$PREDICTION), ])
@


\section{Other operations}
\subsection{Create a SnpMatrix}

The 'GT' element in the \Rcode{FORMAT} field of the VCF represents the 
genotype. These data can be converted into a \Robject{snpMatrix} object 
which can then be used with the functions offered in \Rpackage{snpStats} 
and other packages making use of the \Robject{SnpMatrix} class. 

The \Rfunction{MatrixToSnpMatrix} function converts the genotype calls in
\Rcode{geno} to a \Robject{SnpMatrix}. No dbSNP package is used in this
computation. The return value is a named list where 'genotypes' is a 
\Robject{SnpMatrix} and 'map' is a \Robject{DataFrame} with SNP names and 
alleles at each loci. The \Rcode{ignore} column in 'map' indicates which 
variants were set to NA (missing) because they met one or more of the following 
criteria,
\begin{itemize}
\item{only diploid calls are included; others are set to NA}
\item{only single nucleotide variants are included; others are set to NA}
\item{variants with >1 ALT allele are set to NA}
\end{itemize}

See ?\Rfunction{MatrixToSnpMatrix} for more details.

<<snpMatrix>>=
calls <- geno(vcf)$GT
a0 <- ref(vcf)
a1 <- alt(vcf) 

res <- MatrixToSnpMatrix(calls, a0, a1)
res
@

The \Rcode{ALT} value in the 'map' \Robject{DataFrame} will be a
\Robject{CharacterList} if the VCF was for structural variants or a 
\Robject{DNAStringSetList} otherwise. The column is not clearly visable inside 
the \Robject{DataFrame} but can be extracted and inspected as follows,
<<snpMatrix_ALT>>=
allele2 <- res$map[["allele.2"]]
## number of alternate alleles per variant
unique(elementLengths(allele2))
unlist(allele2)
@

\subsection{Long form GRanges}

The \Rcode{readVcfLongForm} function reads data from a VCF file in the same 
manner as \Rcode{readVcf} but outputs a long form \Rcode{GRanges} instead of 
a \Rclass{VCF} class. This format is driven by the fact that the alternate 
allele (ALT) in the VCF file often has more than one value per record. In the 
long form \Robject{GRanges}, the rows of the \Robject{GRanges} are replicated 
to match the length of the `unlisted' alternate allele. This format provides 
access to each possible REF, ALT pair for each variant.

Input arguments and data subsetting is the same for \Rcode{readVcfLongForm} 
as for \Rcode{readVcf}. The \Rcode{fixed} and \Rcode{info} fields are included 
as elementMetadata columns. Currently no \Rcode{geno} information is included.

\Rcode{info} information was previously collected from the file header. We 
import `HOMSEQ' and `ALT'.
<<longform_header>>=
rownames(info_DF)
@

<<expand_info>>=
param <- ScanVcfParam(fixed="ALT", info="HOMSEQ")
gr <- readVcfLongForm(fl, "hg19", param)
head(gr)
@

\subsection{Write out VCF files}

A VCF file can be written out from data stored in a \Rcode{VCF} class. Methods 
to write out from  more general structures are in progress.

<<writeVcf>>=
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")

out1.vcf <- tempfile()
out2.vcf <- tempfile()
in1 <- readVcf(fl, "hg19")
writeVcf(in1, out1.vcf)
in2 <- readVcf(out1.vcf, "hg19")
writeVcf(in2, out2.vcf)
in3 <- readVcf(out2.vcf, "hg19")

identical(in2, in3)
@

\section{References}
Wang K, Li M, Hakonarson H, (2010), ANNOVAR: functional annotation of genetic 
variants from high-throughput sequencing data. Nucleic Acids Research, Vol 38,
No. 16, e164.\\

\noindent McLaren W, Pritchard B, RiosD, et. al., (2010), Deriving the consequences of
genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics,
Vol. 26, No. 16, 2069-2070.\\

\noindent SIFT home page :
\url{http://sift.bii.a-star.edu.sg/}\\

\noindent PolyPhen home page :
\url{http://genetics.bwh.harvard.edu/pph2/}

\section{Session Information}
<<sessionInfo, echo=FALSE>>=
sessionInfo()
@

\end{document}