%\VignetteIndexEntry{Some old notes about the vectorization feature of the DNAString() constructor. Not for the end user.}
%\VignetteKeywords{DNA, RNA, Sequence, Biostrings, Sequence alignment} 
%\VignettePackage{Biostrings}

%
% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
\documentclass[11pt]{article}

%\usepackage[authoryear,round]{natbib}
%\usepackage{hyperref}


\textwidth=6.2in
\textheight=8.5in
%\parskip=.3cm
\oddsidemargin=.1in
\evensidemargin=.1in
\headheight=-.3in

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}


\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}

\textwidth=6.2in

\bibliographystyle{plainnat} 
 
\begin{document}
%\setkeys{Gin}{width=0.55\textwidth}

\title{Vectorizing the \Rfunction{DNAString} function (work in progress)}
\author{Herv\'e Pag\`es}
\maketitle

\tableofcontents

% ---------------------------------------------------------------------------

\section{Introduction}

This is a short tour on the \Rfunction{DNAString} function vectorization
feature.

Feel free to add your own comments.


% ---------------------------------------------------------------------------

\section{\Rfunction{DNAString} vs \Rfunction{XStringViews}}

The {\tt Biostrings2Classes} vignette presents a proposal for 2 new classes
(\Rclass{XString} and \Rclass{XStringViews}) as a replacement for the
\Rclass{BioString} class currently defined in the \Rpackage{Biostrings}~1
(\Rpackage{Biostrings} v~1.4.x) package.

It also shows how to use the \Rfunction{DNAString} function to create
a \Rclass{DNAString} object (a \Rclass{DNAString} object is just a
particular case of an \Rclass{XString} object):
<<a1>>=
d <- DNAString("TTGAAAA-CTC-N")
is(d, "XString")
@

However this function is NOT vectorized: it always returns a
\Rclass{DNAString} object (which can only represent a {\it single}
string).

In \Rpackage{Biostrings}~1, the \Rfunction{DNAString}
function IS vectorized. Its vectorized form does the following:
(1) concats the elements of its \Robject{src} argument into a
    single big string,
(2) stores the offsets of all these elements in the \Robject{offsets} slot.

This behaviour is not immediatly obvious to the user,
until he looks at the \Robject{offsets} slot.

It always returns a \Rclass{BioString} object (with has as many values
as the number of elements passed in the \Robject{src} argument).


% ---------------------------------------------------------------------------

\section{The \Rfunction{XStringViews} generic function}

The feature described in the previous section (provided by the vectorized
form of the \Rfunction{DNAString} function in \Rpackage{Biostrings}~1)
is provided in \Rpackage{Biostrings}~2 via the
\Rfunction{XStringViews} generic function:
<<b1>>=
v <- XStringViews(c("TTGAAAA-C", "TC-N"), "DNAString")
v
@


% ---------------------------------------------------------------------------

\section{Performance}

The following example was provided by Wolfgang:
<<c1,results=hide>>=
library(hgu95av2probe)
@

<<c2>>=
system.time(z <- XStringViews(hgu95av2probe$sequence, "DNAString"))
z
@

With \Rpackage{Biostrings}~1, the call to
\Robject{DNAString(hgu95av2probe\$sequence)} takes about 20 minutes...
(the implementation of the vectorization feature is
quadratic in time, as reported by Wolfgang).

%<<c3>>=
%length <- 20000
%src <- sapply(1:length,
%              function(i) {
%                paste(sample(DNA_ALPHABET, 250, replace=TRUE), collapse="")
%              })
%system.time(v2 <- XStringViews(src, "DNAString"))
%v2
%@
%
%With \Rpackage{Biostrings}~1, the call to
%\Robject{DNAString(src)} takes more than a minute...


% ---------------------------------------------------------------------------

\section{Loading a FASTA file into an \Rclass{XStringViews} object}

The \Rfunction{read.XStringViews} function can be used to load a FASTA file
in an \Rclass{XStringViews} object:
<<d1>>=
file <- system.file("extdata", "someORF.fa", package="Biostrings")
orf <- read.XStringViews(file, subjectClass="DNAString")
orf
names(orf)
@


% ---------------------------------------------------------------------------

\section{Switching between DNA and RNA views}

The \Rfunction{XStringViews} function can also be used
to switch between ``DNA'' and ``RNA'' views on the same string:
<<e1>>=
orf2 <- XStringViews(orf, "RNAString")
@

These conversions are very fast because no string data needs to be copied:
<<e2>>=
subject(orf)@shared
subject(orf2)@shared
@

\end{document}