%\VignetteIndexEntry{biocViews-HOWTO}
%
% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
\documentclass[12pt]{article}

\usepackage{amsmath}
\usepackage[authoryear,round]{natbib}
\usepackage{hyperref}


\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}


\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}

\textwidth=6.2in

\bibliographystyle{plainnat} 
 
\begin{document}
%\setkeys{Gin}{width=0.55\textwidth}

\title{HOWTO generate biocViews HTML}
\author{S. Falcon and V.J. Carey}
\maketitle


<<echo=FALSE,results=hide>>=
library("biocViews")
library("Biobase")
@

\section{Overview}

The purpose of \Rpackage{biocViews} is create HTML pages that
categorize packages in a Bioconductor package repository according to
terms, or \textit{views}, in a controlled vocabulary.  The fundamental
resource is the VIEWS file placed at the root of a repository.  This
file contains the complete DESCRIPTION file contents for each package
along with additional meta data describing the location of package
artifacts such as archive files for different platforms and vignettes.

The standard behavior of the view generation program is to query the
repository over the internet.  This package includes a static sample
VIEWS file so that the examples in this document can run without
internet access.


\section{Establishing a vocabulary of terms}

We use \texttt{dot} to describe the vocabulary.  For details on the
\texttt{dot} syntax, see
\url{http://www.graphviz.org/doc/info/lang.html}.

<<VocabDefinition>>=
vocabFile <- system.file("dot/biocViewsVocab.dot", package="biocViews")
cat(readLines(vocabFile)[1:20], sep="\n")
cat("...\n")
@

The dot description is transformed to a GXL document using
\texttt{dot2gxl}, a tool included in the graphviz distribution.  The
GXL is then converted to a \Rclass{graphNEL} instance using
\Rfunction{fromGXL} from the \Rpackage{graph} package.  There is a
helper script in the root of the \Rpackage{biocViews} package called
\texttt{updateVocab.sh} that automates the update process if the
required tools are available. The script will also attempt to dump the
ontology graph into a local SQLite database using tools from
\Rpackage{DBI} and \Rpackage{RSQLite}. The information in this
database can be used to create a dynamic HTML representation of the
graph by means of a PHP script.

The definition of the vocabulary lacks a notion of order.  Since
the purpose of the vocabulary is primarily for display, a valuable
improvement would be to use graph attributes to allow the ordering of
the terms.

Another missing piece is a place to put a text description of each
term.  This could also be achieved using graph attributes.

\subsection{Use Case: adding a term to the vocabulary}

To add a new term to the vocabulary:

\begin{enumerate}

\item edit the \textit{dot} file \texttt{dot/biocViewsVocab.dot} and
  add the desired term.  Note that terms cannot contain spaces and
  that the underscore character, \verb+_+, should be used instead.

\item ensure that R and dot2gxl are on your PATH.

\item cd into the biocViews working copy directory.

\item run the updateVocab.sh script.

\item reinstall the package and test that the new term is part of the
  vocabulary.  In short, you will load the data using
  \texttt{data(biocViewsVocab)} and check that the term is a node of
  the graph instance.

\item commit changes to svn.

\end{enumerate}

\subsection{Use Case: updating BioConductor website}

This is for BioConductor web administrator:

\begin{enumerate}

\item update local copy of biocViews using \texttt{svn update}.

\item find the correct instance R that is used to generate HTML pages on BioConductor website, and install the updated \texttt{biocViews}.

\item re-generate the related HTML packages by using \texttt{/home/biocadmin/bin/prepareRepos-*.sh} and \texttt{/home/biocadmin/bin/pushRepos-*.sh}.

\end{enumerate}

\section{Querying a repository}

To generate a list of \Rclass{BiocViews} objects that can be used to
generate HTML views, you will need the repository URL and a graph
representation of the vocabulary.

There are three main Bioconductor package repositories: a software
repository containing analytic packages, an annotation data
repository, and an experiment data repository.  The vocabulary of
terms has a single top-level node, all other nodes have at least one
parent.  The top-level node, \textit{BiocViews}, has three children
that correspond to the three main Bioconductor repositories:
\textit{Software}, \textit{AnnotationData}, and
\textit{ExperimentData}.  Views for each repository are created
separately using \Rfunction{getBiocSubViews}.  Below, we demonstrate
how to build the \textit{Software} set of views.

<<getViews>>=
data(biocViewsVocab)
reposPath <- system.file("doc", package="biocViews")
reposUrl <- paste("file://", reposPath, sep="")
biocViews <- getBiocSubViews(reposUrl, biocViewsVocab, topTerm="Software")

print(biocViews[1:2])

@ 

To query the currently available vocabulary terms, use function
\Rfunction{getSubTerms} on the \Rclass{graphNEL} object
\Robject{biocViewsVocab}. The second argument of this function takes a
character of the base term for which all subterms should be
returned. For a complete list use \Rfunarg{term="BiocViews"}.

<<listTerms>>=
getSubTerms(biocViewsVocab, term="Technology")
@ 

\section{Generating HTML}

By default, the set of HTML views will link to package description
pages located in the html subdirectory of the remote repository.

<<htmlViewsGen>>=
viewsDir <- file.path(tempdir(), "biocViews")
dir.create(viewsDir)
writeBiocViews(biocViews, dir=viewsDir)

dir(viewsDir)[1:2]
@ 

\end{document}