\documentclass[10pt]{article}

%\VignetteIndexEntry{TarSeqQC: Targeted Sequencing Experiment Quality Control}
%\VignetteKeyword{targeted sequencing}
%\VignetteKeyword{quality control}
%\VignetteKeyword{exploration tool}
%\VignettePackage{TarSeqQC}

\usepackage{natbib}
\usepackage{authblk}
\usepackage{times}
\usepackage[colorlinks=false, urlcolor=black, pdfborder={0 0 0}]{hyperref}
\usepackage{amsmath}
\usepackage{enumerate}
\usepackage{graphicx}
\usepackage[nogin]{Sweave}


\textwidth=6.5in
\textheight=8.5in
\parskip=.3cm
\oddsidemargin=-.1in
\evensidemargin=-.1in
\headheight=-.3in

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\texttt{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rcode}[1]{{\texttt{#1}}}
\newcommand{\software}[1]{\textsf{#1}}
\newcommand{\R}{\software{R}}
\newcommand{\Bioconductor}{\software{Bioconductor}}

\newcommand{\bedfile}{\textit{Bed File}}
\newcommand{\bamfile}{\textit{BAM File}}
\newcommand{\fastafile}{\textit{FASTA File}}
\newcommand{\pileup}{\textit{pileup}}
\newcommand{\targetexperiment}{\Rclass{TargetExperiment}}
\newcommand{\targetexperimentlist}{\Rclass{TargetExperimentList}}
\newcommand{\tarseq}{\Rpackage{TarSeqQC}}


\title{\Rpackage{TarSeqQC}: Targeted Sequencing Experiment Quality Control}

\author[1]{Gabriela A Merino}
\author[1]{Crist\'{o}bal Fresno}
\author[2]{Yanina Murua}
\author[2]{Andrea S Llera}
\author[1]{Elmer A Fern\'{a}ndez}
\affil[1]{CONICET-Universidad Cat\'{o}lica de C\'{o}rdoba, Argentina}
\affil[2]{Laboratory of Molecular and Cellular Therapy, Instituto
Leloir-CONICET, Argentina}
\date{\today}
\begin{document}
\SweaveOpts{concordance=TRUE}
% \SweaveOpts{concordance=FALSE}

\maketitle 
\begin{center}
gmerino@bdmg.com.ar
\end{center}

\vspace{2cm}

\begin{abstract}
Targeted Sequencing experiments are a Next Generation Sequencing application, 
designed to explore a small group of specific genomic regions. The 
\tarseq{}$~$ package models this kind of experiments in R, and its main goal is
to allow the quality control and fast exploration of the 
experiment results. To do this, a new \R{} class, called \targetexperiment,
was implemented. This class is based on the \bedfile, that characterize the 
experiment, the alignment \bamfile{} and the reference genome \fastafile. When 
the constructor is called, coverage and read count information are computed for
the targeted sequences. After that, exploration and quality control could be 
carried out using graphical and numerical tools. Density, bar, read profile and
box plots were implemented to achieve this task. A circular histogram 
plot was also implemented in order to summarize all experiment results. 
Coverage or median count intervals can be defined and explored to further
assist quality control analysis. Library and pool preparation, sequencing 
errors, fragment length or GC content bias could be easily detected. Finally, 
an .xlsx file reporting quality control results can be built. \\
Since its 1.1.8 version, \tarseq{} implements a new class, 
\targetexperimentlist{} in order to allow the comparison and joint analysis 
of several Targeted Sequencing experiments performed using the same \bedfile. 
This class includes
several plots that can help to analyze several samples from the same patient or 
from different patients under the same pathological condition. \\
\end{abstract} 
<<General options for R, echo=false, results=hide>>=
options(prompt="> ", continue="+  ", width=90, useFancyQuotes=FALSE, digits=3)
@

\newpage

\tableofcontents

\newpage

\section{Introduction}
\par

Next Generation Sequencing (NGS) technologies produce a huge volume of 
sequence data at relatively low cost. Among the different NGS applications, 
Targeted Sequencing (TS) allows the exploration of specific genomic regions, 
called \textit{features}, of a small group of genes 
\citep{metzker2010sequencing}. An ordinary application of TS is to detect 
Single Nucleotide Polymorphisms (SNPs) involved in several pathologies. 
Nowadays, \textit{TS cancer panels} are emerging as a 
new screening methodology to explore specific regions 
of a small number of genes known to be related to cancer.
%like cancer. In this case, each feature corresponding 
%with a small group of genes related to a particular pathology. \\ 
In TS, specific regions of a DNA sample are copied and amplified by PCR.
If a target region is too large, several primers can be used to
read it. In addition, if the panel also has a large number of interest genomic
regions, different PCR pools could be required in order to achieve a good 
coverage. All DNA fragments are sequenced in an NGS machine, generating millions
of short sequence reads, though less than if the whole genome was sequenced. 
Reads are then aligned against a reference genome and, after that, a downstream
analysis could be performed. However, prior to this, it is crucial 
to evaluate the run performance, as well as the experiment quality control,
i.e., how well the features were sequenced, which feature and gene coverages 
were achieved, if some problems arise in the global setting or by specific 
PCR pools \citep{metzker2010sequencing}. \\

At present, several open access tools can be used to explore and 
control experiment results\citep{lee2012bioinformatics}. Those tools allow
visualization and some level of read profiles quantification. But, they were
%But they are
%very powerful and computational intensive tools moreover if only a small
%fraction of the genome is handled. 
developed as general purpose tools to cover a wide range of NGS applications,
mainly for whole genome exploration. Consequently, they require a great amount
of computational resources and power. On the other hand, in TS only small group
of regions, the features, are required to be explored and characterized in 
terms of coverage or counts, as well as, the evaluation and comparison of pool 
efficiency. In addition, this analysis should be also performed at a gene level
in order to provide a general results overview. In this scenario, current 
genomic tools have become heavy and coarse for such amount of data. 
Consequently, the availability of light, fast and specific tools for 
TS data handling and visualization is a must in current labs.\\
% On the other hand, in TS it is required
% to characterize and evaluate statistics such as the feature and gene
% achieved coverage. If PCR pools were used, is also necessary an evaluation
% and comparison of pool results.\\

Here we present \tarseq{} \R{} package, an exploration tool for fast 
visualization and quality control of TS experiments. 
Its use is not restricted to TS and can also be used to analyze data from
others NGS applications in which \textit{feature-gene} structure could be 
defined, like exons or isoforms in RNA-seq and amplicons in DNA-seq.\\ 

This vignette intends to guide through to the use of the \tarseq{} \R{} 
\Bioconductor{} package. First, the input data format is described. 
Then, we show how to build an instance of the \targetexperiment{} class. 
After that, we will graphically explore the results and do the quality control
over the sequenced features. Finally, we will build an .xlsx report that 
summarize the analysis above.
\newpage
\section{Preliminaries}
\subsection{Installation}
\tarseq{} is a package for the \R{} computing environment and it is assumed
that you have already installed \R{}. See the R project at 
http://www.r-project.org. To install the latest version of \tarseq{},
you will need to be using the latest version of R. \tarseq{} is part of the 
Bioconductor project at http://www.bioconductor.org. To get the \tarseq{}
package you can use:\\
<<echo=TRUE, eval=FALSE>>=
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("TarSeqQC")
@ 
Like other Bioconductor packages, there are always two available versions of 
\tarseq{}. Most users will use the current official release version, which
will be installed by `BiocManager::install` if you are using the current
version of R. There is also a developmental version of the package that
includes new features due for the next official Bioconductor release.
The developmental version will be installed if you are using the developmental
version of R. Both can be found in the Bioconductor site.\\
\subsection{Citation}
The \tarseq{} R package can be cited using:\\
<<echo=TRUE, eval=TRUE>>=
citation("TarSeqQC")
@ 
\newpage
\section{The \targetexperiment{} class}
\tarseq{} \R{} package is based on the \targetexperiment{} class. The Figure 
\ref{fig:class1} shows the \targetexperiment{} class structure.\\
\vspace{-25pt}
\begin{figure}[!hbp]
    \begin{center}
        \includegraphics[scale=0.5]{class.pdf}
    \end{center}
    \caption{\targetexperiment{} class diagram.}
    \label{fig:class1}
\end{figure}
\par 
The \targetexperiment{} class has nine slots:
\begin{itemize}
    \item \Rfunarg{bedFile}: a \Rclass{GRanges} object that models the 
        \bedfile{}
    \item \Rfunarg{bamFile}: a \Rclass{BamFile} object that is a reference to
        the \bamfile{}.
    \item \Rfunarg{fastaFile}: a \Rclass{FaFile} object that is a reference to 
        the reference sequence file.
    \item \Rfunarg{featurePanel}: a \Rclass{GRanges} object that models the 
        feature panel and related statistics.
    \item \Rfunarg{genePanel}: a \Rclass{GRanges} object that models the 
        analyzed panel and related statistics at a gene level.
    \item \Rfunarg{scanBamP}: a \Rclass{ScanBamParam} containing the 
        information to scan the \bamfile.
    \item \Rfunarg{pileupP}: a \Rclass{PileupParam} containing the information 
        to build the pileup matrix.
    \item \Rfunarg{attribute}: a \Rclass{character} indicating which attribute 
        \textit{coverage} or \textit{medianCounts} will be used to the 
        analysis.  
    \item \Rfunarg{feature}: a \Rclass{character} indicating the name of the 
        analyzed features, e.g.: ``amplicon'', ``exon'', ``transcript''.

\end{itemize}

The next sections will illustrate how the \targetexperiment{} methods can be 
used. For this, the \tarseq{} \R{} package provides a \bedfile, a \bamfile, a 
\fastafile{} and a dataset that stores the \targetexperiment{}
object built with those. This example case is based on a synthetic amplicon
sequencing experiment containing 29 \textit{amplicons} of 8 genes in 4
chromosomes. \\
%\clearpage

\subsection{Input Data}

A TS experiment is characterized by the presence of a \bedfile{} which
defines the \textit{features} that should be sequenced. The \tarseq{} package 
follows this architecture, where the \bedfile{} is the key data of the
experiment. However, \tarseq{} also requires mainly three pieces of information
that should be provided in order to call the \targetexperiment{} constructor. 
The \bedfile{}, the \bamfile, that contains the obtained alignment for
the sequenced reads, and the sequence \fastafile. The complete path to these
files should be defined when the \targetexperiment{} constructor is called. \\
Other parameters can also be specified in the \targetexperiment{} object 
constructor. The \Rfunarg{scanBamP} and \Rfunarg{pileupP} are instances of the 
\Rclass{ScanBamParam} and \Rclass{PileupParam} classes defined in the 
\Rpackage{RSamtools} \R{} \Bioconductor{} package \citep{rsamtools}. These 
parameters specify how to scan the \bamfile{} and how to build the 
corresponding \textit{pileup}, that will be used for exploration and 
quality control. The \Rfunarg{scanBamP} allows the specification of the 
features specified in the \bedfile, according to \cite{rsamtools} 
specifications. The \Rfunarg{pileupP} establishes what information
should be contained in the pileup matrix, for instance, if nucleotides and/or
strand should be distinguished. If these two parameters are not specified, 
the default values of their constructors will be used. \\
Other important parameters that should be specified before to conduct the 
\textit{Quality Control} are \Rfunarg{feature} and \Rfunarg{attribute}. The 
first is a character that determines which kind of features are contained in 
the \bedfile{}. In the example presented here, \textit{amplicon} is the feature
type. The second parameter, \Rfunarg{attribute}, can be \textit{coverage} or 
\textit{medianCounts} defining which measures will be considered in the 
\textit{Quality Control} analysis.

\subsubsection{Bed File}

The \bedfile{} is stored as a \targetexperiment{} slot and it is modeled as a 
\Rclass{GRanges} object (\cite{genomicranges}). The \bedfile{} must be a
tabular file in which each row represents one feature. This file
should contain at least ``chr'', ``start'', ``end'' , ``name'' and ``gene''
named columns. Additional columns like ``strand'' or another experimental
information could be included and would be conserved. For example, in some
experiments, more than one PCR pool is necessary. In this case, the \bedfile{}
must also contain a ``pool'' column specifying the pool in which each feature
was defined. This information is an imperative requisite to evaluate the 
performance of each PCR pool. \\
A \Rclass{GRanges} object represents a collection of genomic features each 
having a single start and end location on the genome \citep{genomicranges}. In 
order to use it to represent the \bedfile, the ``chr'', ``start'' and ``end'' 
mandatory fields will be used to define the ``seqnames'', ``start'' and ``end''
\Rclass{GRanges} slots. The same will occur if the optional field ``strand'' is
included in the \bedfile{}. The ``name'' column will be set to ranges
identifiers. Finally, ``gene'' and additional columns like ``pool'', will be
stored as metadata columns. \\
In order to create a \targetexperiment{} object, the complete route to the 
\bedfile{} and its name must be specified as a \Rclass{character} \R{} object. 
Thus, to use the example \bedfile{} provided by \tarseq: 
<<>>=
bedFile<-system.file("extdata", "mybed.bed", package="TarSeqQC", mustWork=TRUE)
@ 
Note that any experiment, in which can be defined 
\textbf{\textit{feature-gene}} relations, could be analyzed using the
\tarseq{} \R{} \Bioconductor{} package. For instance, if you have an RNA-seq
experiment and you are interested in exploring some genes, you could build your
customized \bedfile{} in which the \textit{feature} could be ``exon'' or 
``transcript''. 

\subsubsection{BAM File}
The \bamfile{} stores the ordered alignment results \citep{li2009sequence}. In
this example case, it corresponds to the amplicon sequencing experiment
alignment. This file will be used to build the \textit{pileup} matrix for the
selected features. Briefly, a \textit{pileup} is a matrix in which each row 
represents a genomic position and have at least three columns: 
``pos'', ``chr'' and ``counts''. The first and second columns specify the 
genomic position and ``counts'' contains the total read counts for this 
position. Pileup matrix could contain four additional columns that store
the read counts for each nucleotide at this position.\\ 
In order to call the \targetexperiment{} constructor, the complete route to the
\bamfile{} and its name must be specified as a \Rclass{character} \R{} object. 
For example, we can define it in order to use \targetexperiment{} external 
data:
<<>>=
bamFile<-system.file("extdata", "mybam.bam", package="TarSeqQC", mustWork=TRUE)
@ 
When the \targetexperiment{} constructor is called the \bamfile{}, will be
stored as a \Rclass{BamFile} object \citep{rsamtools} and this object will be a
\Rclass{TargetExperiment} slot.

\subsubsection{FASTA File}
The \fastafile{} contains the reference sequence previously used to
align the \bamfile{} and will be used to extract the feature sequences. This 
information is useful to compare the pileup results with the reference, in 
order to detect potential \textit{nucleotide variants}. To create a 
\targetexperiment{} object, the full path to the \fastafile{} and its name must
be specified as a character \R{} object. For example:\\
<<>>=
fastaFile<-system.file("extdata", "myfasta.fa", package="TarSeqQC", 
                        mustWork=TRUE)
@ 
The \fastafile{} will be stored as a \Rclass{FaFile} object (\cite{rsamtools})
and this object will be set to a \Rclass{TargetExperiment} slot.

\subsubsection{Additional input data}
The previous files are mandatory to call the \targetexperiment{}
constructor. Additional parameters can be set in order to apply several
methods and perform quality control and results exploration. These parameters
are:
\begin{itemize}
    \item \Rfunarg{scanBamP}: is a \Rclass{ScanBamParam} object, that specifies
    rules to scan a \Rclass{BamFile} object. For example, if you wish only keep
    those reads that were properly paired, or those that have a specific Cigar
    code, \Rfunarg{scanBamP} can be used to specify it. In TS experiments, 
    we want to analyze only the features. The way to specify this is
    using the \Rfunarg{which} parameter in the \textit{scanBamP} constructor. 
    If the \Rfunarg{scanBamP} parameter was not specified in the 
    \targetexperiment{} constructor calling, its default value will be used and
    then, the \Rfunarg{which} parameter will be specified using the \bedfile{}.
    \item \Rfunarg{pileupP}: is a \Rclass{PileupParam} object, that specifies
    rules to build the \pileup{} starting from a \Rclass{BamFile}. You can use
    the \Rfunarg{pileupP} parameter to specify if you want to distinguish 
    between nucleotides and or strands, filter low read quality or low mapping
    quality bases. If the \Rfunarg{pileupP} parameter is not specified, its
    default value will be used. 
    \item \Rfunarg{attribute}: is a \Rclass{character} that specifies which 
    attribute must be used for the results exploration and quality
    control. The user can choice between \textit{medianCounts} or 
    \textit{coverage}. If the \textit{attribute} parameter is not specified in
    the \targetexperiment{} constructor, it will be set to  \Rfunarg{""}. 
    But, prior  to performing some exploration or control, this argument must 
    be set using the \Rmethod{setAttribute()} method. 
    \item \Rfunarg{feature}: is a \Rclass{character} that defines what means a 
    \textit{feature}. In this vignette a little example using a 
    synthetic amplicon targeted sequencing experiment is shown, thus the 
    feature means an \textit{amplicon}. But, the use of \tarseq{} \R{} package
    is not restricted to analyze only this kind of experiments. If you don't
    specify the \Rfunarg{feature} parameter, it will be set to \Rfunarg{""}. 
    But, as in the \Rfunarg{attribute} parameter, it must be set prior to 
    performing some exploration or control. It can be done using the 
    \Rmethod{setFeature()} method. 
    \item \Rfunarg{BPPARAM}: is a \Rclass{BiocParallelParam} instance defining
    the parallel back-end to be used during evaluation (see 
    \citep{biocparallel}). It allows the specification of how many 
    \Rfunarg{workers} (CPUs) will be used, etc.
\end{itemize}

For more information about \Rclass{ScanBamParam} and \Rclass{PileupParam} 
constructors see \Rpackage{Rsamtools} manual.

\subsection{Creating a \targetexperiment{} object}
\subsubsection{Controlling files consistency}
\tarseq{} provides a specific method that allows the inspection of the 
\bedfile{} and \fastafile{} consistency previous to the \targetexperiment{}
constructor call. The \Rfunction{checkBedFasta} method receives the full path
to those files and as it is executed it is printing several messages into the
screen. Duplicated features location or names, incorrect start/end positions 
and inconsistency between the features and the reference genome will be 
analyzed. 
<<library call, echo=FALSE, results=hide, eval=TRUE>>=
suppressMessages(library("TarSeqQC"))
suppressMessages(library("BiocParallel"))
@ 
<<bedFasta control, echo=TRUE, eval=TRUE>>=
checkBedFasta(bedFile , fastaFile)
@ 
\subsubsection{Constructor call}
Once you have defined the input data presented above, the \targetexperiment{} 
constructor could be called using:\\
<<constructor, echo=TRUE, results=hide, eval=FALSE>>=
library("TarSeqQC")
library("BiocParallel")
BPPARAM<-bpparam()
myPanel<-TargetExperiment(bedFile, bamFile, fastaFile, feature="amplicon", 
                        attribute="coverage", BPPARAM=BPPARAM)
@ 
<<loading TargetExperiment object, echo=FALSE>>=
BPPARAM<-bpparam()
data(ampliPanel, package="TarSeqQC")
myPanel<-ampliPanel
rm(ampliPanel)
setBamFile(myPanel)<-system.file("extdata", "mybam.bam", package="TarSeqQC",
                                    mustWork=TRUE)
setFastaFile(myPanel)<-system.file("extdata", "myfasta.fa", 
                                    package="TarSeqQC", mustWork=TRUE)

@
When \Rfunction{TargetExperiment} is called, some \targetexperiment{} methods
are invoked in order to define two of the \targetexperiment{} slots. First, the
\Rfunction{buildFeaturePanel} is internally used in order to build the
\Rfunarg{featurePanel} slot. Then, the \Rfunction{summarizePanel} is invoked in
order to build the \Rfunarg{genePanel} slot, which contains the information 
summarized at a gene level.\\
In the previous example, the \Rfunarg{feature} and 
\Rfunarg{attribute} parameter values were defined. If they aren't specified in
the constructor call, the \targetexperiment{} object can be created, but a 
warning message will be printed. After that, the \Rfunction{setFeature} and 
\Rfunction{setAttribute} methods should be used to set these values. For 
example:\\
<<setters>>=
# set feature slot value
setFeature(myPanel)<-"amplicon"
# set attribute slot value
setAttribute(myPanel)<-"coverage"
@
As mentioned earlier, when the \Rfunarg{scanBamP} and \Rfunarg{pileupP} are 
not specified in the constructor call, they assume their default constructor. 
But, after the constructor call, they could be specified using 
\Rfunarg{setScanBamP} and \Rfunarg{setPileupP} methods. \\
<<setters, eval=FALSE>>=
# set scanBamP slot value
scanBamP<-ScanBamParam()
#set scanBamP which slot
bamWhich(scanBamP)<-getBedFile(myPanel)
setScanBamP(myPanel)<-scanBamP
# set pileupP slot value
setPileupP(myPanel)<-PileupParam(max_depth=1000)
# build the featurePanel again
setFeaturePanel(myPanel)<-buildFeaturePanel(myPanel, BPPARAM)
# build the genePanel again
setGenePanel(myPanel)<-summarizePanel(myPanel, BPPARAM)
@
Note that the previous code specifies that the maximum read depth can be 1000.
If you have some genomic positions that have more than 1000 reads, they will be
truncated to this number. It is also important to note that, if any change in 
the \Rfunarg{scanBamP} and/or \Rfunarg{pileupP} slots was done, the 
\Rfunarg{featurePanel} and the \Rfunarg{genePanel} slots will be set again.\\
The \tarseq{} \R{} package provides a dataset that stores the
\targetexperiment{} object built in the previous steps. To use it, run:
<<loading TargetExperiment object>>=
data(ampliPanel, package="TarSeqQC")
@
The loaded object is called \textit{ampliPanel}. As was mentioned before, some
\targetexperiment() methods need to consult the \bamfile{} and \fastafile, and
for this, the \Rfunarg{bamFile} and \Rfunarg{fastaFile} slots are used. Given 
that \textit{ampliPanel} was built by the package creators, the path file 
routes that have its slots are not the same in which these files are located 
on users' computers. Thus, after \textit{ampliPanel} loading, it is necessary to
re-define the \bamfile{} and \fastafile{} path files, running:
<<re-defininf file paths>>=
# Defining bam file and fasta file names and paths
setBamFile(ampliPanel)<-system.file("extdata", "mybam.bam", 
    package="TarSeqQC", mustWork=TRUE)
setFastaFile(ampliPanel)<-system.file("extdata", "myfasta.fa", 
    package="TarSeqQC", mustWork=TRUE)
@
Note that \Rfunarg{featurePanel} and \Rfunarg{genePanel} do not need to be 
rebuilt. The redefinition of the file names is only necessary in order to use 
\targetexperiment{} methods that query this files.\\

\subsubsection{Early exploration}
The \targetexperiment{} class has typical \Rfunction{show/print} and 
\Rfunction{summary} \R{} methods implemented. In addition, the 
\Rfunction{summaryGeneLev} 
and \Rfunction{summaryFeatureLev} methods allow the summary exploration at 
``gene'' and ``feature'' level. The next example illustrates how these methods
can be called:
<<exploration>>=
# show/print
myPanel
# summary
summary(myPanel)
#summary at feature level
summaryFeatureLev(myPanel)
#summary at gene level
summaryGeneLev(myPanel)
@
Using those methods you can easily find, for example, that amplicons were 
sequenced,in average, at a coverage of 312. It can also be observed that there 
is at least one amplicon that was not read. This is because the minimum value 
of the attribute (\textit{coverage}) is 0. To complement this analysis, the 
attribute distribution can be explored using:\\
<<boxplot and density plot, fig=FALSE, eval=FALSE>>=
g<-plotAttrExpl(myPanel,level="feature",join=TRUE, log=FALSE, color="blue")
x11(type="cairo");
g
@ 
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.3]{attrExpl.pdf}
    \end{center}
    \caption{Attribute distribution and density plots.}
    \label{attrexpl}
\end{figure}
\par
In Figure \ref{attrexpl}, the \Rfunarg{join} parameter was set to 
\Rfunarg{TRUE}. Thus, density and box plots are plotted together. If it is set
to \Rfunarg{FALSE}, the figure will contain the attribute box-plot on the left
and the corresponding attribute density plot on the right. \\
Exploration of any metadata information is also provided. \textit{GC content, 
feature length} distributions and \textit{gene} or \textit{pool} frequencies 
can be explored using the \Rfunction{plotMetaDataExpl} method. Other metadata 
columns, specified in the \textit{bedFile}, can also be analyzed. If the 
analyzed metadata is numeric, then boxplot and density plot is built. Those
can be plotted together, or not, and in log10 scale. On the other hand, if it is
categorical like \textit{gene} or \textit{pool}, a bar frequency (absolute
or in relative percentages) is plotted. The following code allows the 
exploration of feature length distributions and gene frequencies along the 
features.\\
<<length exploration, echo=TRUE, fig=FALSE,eval=FALSE>>=
# explore amplicon length distribution
plotMetaDataExpl(myPanel, "length", log=FALSE, join=FALSE, color=
                    "blueviolet")
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.4]{lengthExploration.pdf}
    \end{center}
    \caption{Amplicon length distribution' plot.}
    \label{fig:lengthplot}
\end{figure}
\par
<<gene exploration, echo=TRUE, fig=FALSE,eval=FALSE>>=
# explore gene's relative frequencies 
plotMetaDataExpl(myPanel, "gene", abs=FALSE)
@
\begin{figure}[!h]
    \begin{center}
        \includegraphics[scale=0.4]{geneExploration.pdf}
    \end{center}
    \caption{Gene's relative frequencies.}
    \label{fig:geneplot}
\end{figure}
\par
Figure \ref{fig:lengthplot} indicates that the mean amplicon length is 74.7 
nucleotides with standard deviations of 20.8. But, as can be observed in the
density plot, the mode is lower than this mean. In addition, half of the 
amplicons have their lengths lower than 71, and the rest between 71 and 125.\\
Figure \ref{fig:geneplot} shows the gene relative frequencies, 
in percentages. As can be viewed, more than the 22\% of the amplicons belong to
`gene8' and approximately 21\% belong to `gene5'. On the other hand, both 
`gene1' and `gene3' have less than 5\% of total amplicons.\\

\subsection{Deep exploration and Quality Control}
\subsubsection{Targeted selection performance}
In the context of TS experiments, it is expected that most of the sequencing 
reads come from the target regions and not from the rest of the genome. If it
does not, it may suggest an unexpected behavior of the used primers library 
that could impact on the read counts of overlapped regions skewing 
interpretation.\\
One way to check this is using the \Rfunction{readFrequencies} method which 
returns a \Rclass{data frame} containing, for each chromosome, the amount 
and the percentage of reads fall in and out features. Therefore, if this graph
shows a high percentage of reads that fall outside of interest features, a 
large part of the sequencing reads will be discarded when feature coverage is 
computed and consequently the power of detection will be lower.\\
After \Rfunction{readFrequencies} calling, the \Rfunction{plotInOutFeatures}
method can be used in order to plot the obtained \Rclass{data frame}. It can be
achieved running the next code:
<<read percentages and plot, eval=FALSE>>=
readFrequencies(myPanel)
plotInOutFeatures(readFrequencies(myPanel))
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.4]{plotInOut.pdf}
    \end{center}
    \caption{Percentages of reads falling in or out of the targeted regions.}
    \label{fig:plotInOut}
\end{figure}
\par
Figure \ref{fig:plotInOut} shows that more than 15% of sequencing reads belong
to genomic regions outside of the expected amplicons (red bars). This result 
is not expected, since it indicates that the used primers were not so 
specifics to capture only the targeted regions. Moreover, TS libraries are 
carefully designed to sequence specific genomic regions and the loss of a 
lot of sequence reads has a direct impact on the achieved coverage of
the targeted regions. For these reasons the early detection of this technical 
effect can avoid false subsequent analysis conclusions.\\
\subsubsection{Panel overview}
Working on a TS experiment, it would be useful simultaneously evaluate the
overall performance of the features. For instance, the user could be interested
in exploring the proportion or amount of features within specific attribute 
intervals. In the example data, five coverage intervals could be defined 
according to the Table \ref{table:featureInterval}. Each user can define its
attribute intervals table and the amount of them according to its needs.\\
\begin{table}[ht]
    \caption{Coverage intervals} % title name of the table
    \centering % centering table
    \begin{tabular}{| l | l | } % creating 10 columns
        \hline
        \textbf{Coverage Interval} & \textbf{Motivation} \\ 
        \hline
        $[0,1)$ & \textit{Not sequenced}\\ [0.5ex]
        \hline
        $[1,50)$ & \textit{Low sequencing coverage}\\ [0.5ex]
        \hline
        $[50,200)$ & \textit{Regular sequencing coverage}\\ [0.5ex]
        \hline
        $[200,500)$ & \textit{Very good sequencing coverage}\\ [0.5ex]
        \hline
        $[500,Inf)$ & \textit{Excellent sequencing coverage}\\ [0.5ex]
        \hline
    \end{tabular}
    \label{table:featureInterval}
\end{table}
In this example, 0 indicates \textit{no reads} and \textit{Inf} refers to
features with coverage higher than 500. To incorporate the coverage intervals
into the analysis it is necessary to specify each interval extreme value like:
<<attributeThres>>=
# definition of the interval extreme values
attributeThres<-c(0,1,50,200,500, Inf)
@
%In order to facilitate the object exploration and help the 
%\textit{Quality Control} process, additional methods were developed. 
A panel results overview is critical in order to compare and integrate those
at the feature, gene, and chromosome level. To help this task, \tarseq package
implements the \Rfunction{plot} method, a graphical tool consisting in a polar
histogram where each gene is represented as a bar. Each bar is colored 
depending on the percentage of features that have their attribute value within 
a particular prefixed interval. In addition,
the bars (genes) can be grouped in chromosomes in order to facilitate coverage
results comparison at this level. The next code builds this plot:
<<plot, fig=FALSE, eval=FALSE>>=
# plot panel overview
plot(myPanel, attributeThres=attributeThres, chrLabels =TRUE)
@
\begin{figure}[!hpt]
    \begin{center}
        \includegraphics[scale=0.7]{plot.pdf}
    \end{center}
    \caption{Panel overview plot.}
    \label{fig:plot}
\end{figure}
\par
In the example presented here, we can easily distinguish that the unique 
amplicon of the ``gene3'' was not sequenced. This is because in Figure
\ref{fig:plot}, the bar corresponding to ``gene3'' is colored in red and
this color is related to the [0,1) coverage interval. In the same plot, can
also be appreciated that this gene has only one amplicon, as depicted in 
parenthesis in the bar label  \textit{``gene3(1)''}. Meanwhile ``gene4'' has
40\% of its amplicons (2) with a coverage between 1 and 50, 20 \% (1 amplicon) 
with coverage between 50 and 200 and the rest with a very good coverage value,
greater than 200. It is notable that this plot provides a fast overview of the
coverage feature performance.\\
It is important to note that a small and simple example is presented here. The 
previous plot could have a greater impact when the panel has more features and
genes. Figure \ref{fig:plotCPP} shows the panel overview for a TS Experiment
based on the \textit{Ion AmpliSeq Cancer Panel Primer Pool} \textregistered{}. 
This is a TS Panel offered by \textit{Life Technologies} (\cite{CPP}) that
allows the inspection
of 190 amplicons. In this case, can be easily observed that \textit{MLH1} and 
\textit{CDKN2A} genes were no sequenced. Can also be appreciated that several
genes like \textit{ALK, NRAS, BRAF, HNF1A}, have uniform coverage values
along their amplicons. On the contrary, \textit{RB1}' and \textit{ATM} genes
have some amplicons with low coverage and some other with a high coverage.
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.9]{plotCPP.pdf}
    \end{center}
    \caption{Cancer Panel Primer Pool overview plot.}
    \label{fig:plotCPP}
\end{figure}
Another alternative to explore attribute (coverage) performance is given by
\Rfunction{plotFeatPerform} that shows a similar plot expanding it along the
x-axis. This method also allows the exploration of attribute values at feature
and gene levels, setting its parameter \Rfunarg{complete} as \Rfunarg{TRUE}. As
result, the generated graph will contain two plots. The upper panel is a bar 
plot at the feature level and the lower, at the gene level. Both graphics 
incorporate the prefixed attribute intervals information and contain a red line
to indicate the average value of the attribute at the corresponding level.
In our example, you could run:\\
<<plotFeatPerform, fig=FALSE, eval=FALSE>>=
# plot panel overview
g<-plotFeatPerform(myPanel, attributeThres, complete=TRUE, log=FALSE, 
    featureLabs=TRUE, sepChr=TRUE, legend=TRUE)
g
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.7]{featPerf.pdf}
    \end{center}
    \caption{Amplicon coverage performance. The upper panel is a bar plot at 
    feature level, and the lower, at gene level.}
    \label{fig:perform}
\end{figure}
Figure \ref{fig:perform} helps the coverage value evaluation for each amplicon
and gene. It is noticeable that when coverage is summarized at gene
level the highest value is lower than 500. However, at amplicon level, the 
highest value is greater than 800. \\
The previous plot is also very useful when TS panels were made-up by several
primer pools combination. For example, the
\textit{Comprehensive Cancer Panel} \textregistered{} is another 
\textit{Life Technologies} panel that allows the exploration of 16000 amplicons
from 409 genes related to
several cancer types using 4 primer pools (\cite{CCP}). In this case, the
\bedfile{} contains a `pool column that stores the pool number for each
feature. This information will be conserved in the \targetexperiment{}
object that was built from this panel.\\
In the \textit{Quality Control} context, it is important to evaluate in
early analysis stages, if some pool effect exists and if all pool results are
comparable. For example, Figure \ref{fig:perfCCP} illustrates the use
of the \Rfunction{plotFeatPerform} in the described case. Now, you can see that
the graphic corresponding to the amplicon level shows a separation between 
amplicons according to its pool value. Note that the same plot at a gene level
is not showed because the \Rfunarg{complete} parameter was set to 
\Rfunarg{FALSE}. It
is important to emphasize that, if correspond, the pool information will be
included in all methods of the \targetexperiment{} class. Thus, for example,
when you call the \Rfunction{summary} function for a \targetexperiment{} object
that has pool information, the output will contain statistic results for the
amplicon level and for each pool separately.
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.3]{perfExplCCP.pdf}
    \end{center}
    \caption{Performance exploration of an Ion AmpliSeq Comprehensive Cancer 
            Panel experiment.}
    \label{fig:perfCCP}
\end{figure}

\subsubsection{Controlling possible attribute bias}
It is known that those DNA fragments having high GC content or high length 
tends to be 'more sequenced' those with lower GC/length values. Then, GC 
content and feature's length can be considered as possible bias sources, in 
particular, when the feature is an exon or a transcript. In order to check it,
\tarseq{} incorporates \Rfunction{biasExploration} that allows the inspection 
of the selected attribute ('coverage' in the example) in terms of groups or 
intervals of bias sources. To do this, the method defines four source groups 
according to the distribution quartiles of the selected source (GC content, 
length or any included in the panel metadata) and then assigns one group to 
each feature. After that boxplot and, optionally, density plot for each group
are plotted in order to evaluate possible bias. If the selected source is a
categorical variable, like pool or gene, each category will be considered as
a group. For instance, the next lines allows the exploration of GC content 
effect:
<<gc exploration, echo=TRUE, eval=FALSE>>=
x11(type="cairo")
biasExploration(myPanel, source="gc", dens=TRUE)
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.5]{gcExploration.pdf}
    \end{center}
    \caption{Exploration of amplicon coverage in each of amplicon GC content's
    quartiles.}
    \label{fig:gcExploration}
\end{figure}

\subsubsection{Controlling low counts features}
Low counts features should be detected in early analysis stages. The 
\Rfunction{summaryIntervals} method builds a frequency table of the fetaures
that have its attribute value between predefined intervals. For example, if you
are interested in exploring the ``coverage'' intervals defined before, you could
do:\\
<<frequency table>>=
# summaryIntervals
summaryIntervals(myPanel, attributeThres)
@
The previous method is also useful when you are interested in quantifying how
many features have its attribute value (\textit{coverage}) lower or higher than
a threshold. In this example, we are interested in knowing how many amplicons 
have shown at least a coverage of 50, because we consider that this is a 
minimum value that we will admit. This is a typical aspect that will be 
analyzed when you do an experiment \textit{Quality Control}. Complementing this
method, the \Rfunction{plotAttrPerform} method allows the graphical exploration
of relative and cumulative feature frequencies in the predefined intervals. The
corresponding function call is:\\
<<coveragePerform,eval=FALSE, fig=FALSE>>=
plotAttrPerform(myPanel, attributeThres)
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.6]{coveragePerform.pdf}
    \end{center}
    \caption{Relative and cumulative amplicon frequencies in the specified
    intervals.}
    \label{fig:covPerf}
\end{figure}
In the cases in which several pools were used, both 
\Rfunction{summaryIntervals} and \Rfunction{plotAttrPerform} allows the
exploration at pool level. Thus, differences among pool can be easily 
detected.\\
Another method that could help this analysis is \Rfunction{getLowCtsFeatures}. 
This method returns a \Rclass{data frame} object containing all the features
that have its attribute value lower than a threshold. The output 
\Rclass{data frame} also contains panel and attribute information for each
feature. For example, to known which are the genes that have a
coverage value lower than 50, you can do:
<<low counts features>>=
getLowCtsFeatures(myPanel, level="gene", threshold=50)
@
In addition, if you want to known which amplicons have a coverage
value lower than 50, you should execute:
<<low counts features>>=
getLowCtsFeatures(myPanel, level="feature", threshold=50)
@
Graphical methods were also implemented. The \Rfunction{plotGeneAttrPerFeat} 
allows the attribute value exploration for all the features of a selected gene.
For instance, if you want to explore the ``gene4'', you should do:
<<geneAttrPerFeat, fig=FALSE, eval=FALSE>>=
g<-plotGeneAttrPerFeat(myPanel, geneID="gene4")
# adjust text size
g<-g+theme(title=element_text(size=16), axis.title=element_text(size=16),
    legend.text=element_text(size=14))
g
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.5]{gene4AttrFeat.pdf}
    \end{center}
    \caption{Performance attribute exploration of the \textit{gene4}.}
    \label{fig:perfgene4}
\end{figure}

The attribute value for each feature contained in the ``gene4'' can be observed
in Figure \ref{fig:perfgene4}. If the selected gene has some features
overlapped, the \Rfunarg{overlap} and \Rfunarg{level} parameters of the 
\Rfunction{plotGeneAttrPerFeat} method could be useful. When the 
\Rfunarg{overlap} parameter is TRUE the method 
determines the overlap between features and defines new features called
`Regions'. For each of those regions, their attribute value is defined as the
mean value of the attributes of the overlapped features. The \Rfunarg{level}
parameter, indicates the level of the output plot of the 
\Rfunction{PlotGeneAttrPerFeat}. The allowed values for \Rfunarg{level} are: 
\Rfunarg{"feature"}, \Rfunarg{"region"} and \Rfunarg{"both"}. The combination 
of these two parameters define the plot behavior:
\begin{itemize}
\item \Rfunarg{overlap}=\Rfunarg{FALSE}: amplicon overlap will be ignored. 
\begin{enumerate}
\item \Rfunarg{level} must be \Rfunarg{"feature"} (default values)
\end{enumerate}
\item \Rfunarg{overlap} = \Rfunarg{TRUE}: Amplicons will be grouped into 
regions according to their overlap
\begin{enumerate}
\item \Rfunarg{level} =\Rfunarg{"feature"}, feature coverage bar plot grouped
by overlapped regions as facets.
\item \Rfunarg{level} =\Rfunarg{"region"}, overlapped region's coverage bar
plot.
\item \Rfunarg{level} =\Rfunarg{"both"}, plot with two paneles: the bar plot at
feature level and at the region level. 
\end{enumerate}
\end{itemize}
For instance, in the last case, \Rfunarg{overlap}=\Rfunarg{TRUE} and
\Rfunarg{level}=\Rfunarg{"both"} the returned plot will consist of two panels.
The top panel is the same that is generated when the \Rfunarg{level} is 
\Rfunarg{"feature"} including ggplot2 facets indicating which features are 
included in each region. The bottom panel is the returned when \Rfunarg{level} 
is \Rfunarg{"region"}. \\
<<geneAttrPerFeat, fig=FALSE, eval=FALSE>>=
g<-plotGeneAttrPerFeat(myPanel, geneID="gene4", overlap = TRUE, level="both")
g
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.5]{geneAttrPerFeatOverlap.pdf}
    \end{center}
    \caption{Performance attribute exploration of the \textit{gene4} at 
    amplicon and overlapped region levels.}
    \label{fig:perfgene4Ov}
\end{figure}
\par
It is worth to note that if a quality control at overlapped regions level is
required, the user could define a new bed file in which the features will be
the regions and run the TarSeqQC over this new bed file. To assist this task,
the package provides the \Rfunction{getOverlappedRegions} method, internally
called by the \Rfunction{plotGeneAttrPerFeat} method. 
\Rfunction{getOverlappedRegions} needs two parameters, a \targetexperiment{} 
object and a \Rclass{logical} object, 
\Rfunarg{collapse} indicating if the overlap information should be collapsed or 
not at the collapsed region level. Thus, if the user wants to define a new 
\bedfile{} based on the overlapped regions, he could run the code below to 
obtain a \Rclass{data.frame} summarizing the feature panel at a region level.
Then, this object can be saved to a \bedfile{} for future use of \tarseq.\\
<<getOverlappedRegions>>=
dfRegions<-getOverlappedRegions(myPanel,collapse=TRUE)
head(dfRegions)
#change the region_id field name to consistency of the new bed file
names(dfRegions)[5]<-"name"
@
\vspace{-20pt}
<<getOverlappedRegions, eval=FALSE>>=
#save the new bed file
write.table(dfRegions[,1:5], file="myRegions.bed", sep="\t", col.names=T,
row.names=F,quote=F)
@
\subsubsection{Read counts exploration}
\textit{Quality Control} in TS experiments implies the analysis of 
coverage/median counts achieved for each feature. But, sometimes,
the exploration of the read profile that was obtained for a particular
genomic region or a feature could be interesting. For this reason, the 
\tarseq{} \R{} package provides methods to help the exploration at a nucleotide
resolution. Those methods are based on the \Rclass{data frame} returned
by the \Rfunction{pileupCounts} method. This contains the read counts 
information for each nuecleotide of the features contained in the \bedfile. 
Note that the columns in the obtained \Rclass{data frame} could change, 
depending on the \Rfunarg{pileupP} parameter definition. In our case we are 
working with its default constructor and only the \Rfunarg{maxdepth} parameter
was modified. For this reason, the resultant \Rclass{data frame} will contain
one column for each nucleotide and one column (``-'') storing deletion counts.\\
\Rfunction{pileupCounts} is a function, not a \targetexperiment{} method, thus
it could be called externally to the class. In order to call it, the 
specification of several parameters is needed:
\begin{itemize}
    \item \Rfunarg{bed}: is a \Rclass{GRanges} object that, at least, should 
    have values in the \Rfunarg{seqnames}, \Rfunarg{start} and \Rfunarg{end} 
    slots.
    \item \Rfunarg{bamFile}: is a \Rclass{character} indicating the full path 
    to the \bamfile.
    \item \Rfunarg{fastaFile}: is a \Rclass{character} indicating the full path
    to the \fastafile.
    \item \Rfunarg{scanBamP}: is a \Rclass{ScanBamParam} object, that specifies
    rules to scan the \Rclass{BamFile}. If it was not specified, its default 
    constructor values will be used and then, the \Rfunarg{which} parameter 
    will be specified using the \Rfunarg{bed} parameter.
    \item \Rfunarg{pileupP}: is a \Rclass{PileupParam} object, that specifies
    rules to build the \pileup{}, starting from the \Rclass{BamFile}. If it was
    not specified, the \Rfunarg{pileupP} parameter will be defined using the 
    constructor default values.
    \item \Rfunarg{BPPARAM}: is a \Rclass{BiocParallelParam} instance defining
    the parallel back-end to be used during evaluation (see 
    \citep{biocparallel}).
\end{itemize}
In order to work with the example data, it is necessary do:
<<pileupCounts, eval=FALSE>>=
# define function parameters
bed<-getBedFile(myPanel)
bamFile<-system.file("extdata", "mybam.bam", package="TarSeqQC", mustWork=TRUE)
fastaFile<-system.file("extdata", "myfasta.fa", package="TarSeqQC", 
                        mustWork=TRUE)
scanBamP<-getScanBamP(myPanel)
pileupP<-getPileupP(myPanel)
#call pileupCounts function
myCounts<-pileupCounts(bed=bed, bamFile=bamFile, fastaFile=fastaFile, 
                        scanBamP=scanBamP, pileupP=pileupP, BPPARAM=BPPARAM)
@
<<pileupCounts, eval=TRUE, echo=FALSE>>=
data("myCounts", package="TarSeqQC")
@
<<pileupCounts, eval=TRUE, echo=TRUE>>=
head(myCounts)
@
The obtained \Rclass{data frame} contains the \textit{pileup} information. It 
can be used to build a \textit{read profile} plot where the x-axis 
represents the genomic position and the y-axis, the obtained read counts. The 
\Rfunction{plotRegion} allows exploration of the read profile for a specific 
genomic region. The \Rfunction{getRegion} method extracts the information for 
a genomic region. For example, the code below 
returns a \Rclass{data frame} with location information of ``gene7'' amplicons:
<<getRegion>>=
#complete information for gene7
getRegion(myPanel, level="gene", ID="gene7", collapse=FALSE)
#summarized information for gene7
getRegion(myPanel, level="gene", ID="gene7", collapse=TRUE)
@
Then, the previous information can be used to specify a genomic region and plot
its read count profile using the \Rfunction{plotRegion} method. Since a large
number of little changes at nucleotide level could be coused by the sequencing
process, the method provides a way to filter out those noisy variants of the 
read profile. The \Rfunarg{minAAF} specifies the minimum alternative allele 
frequency need to call a SNP and its default value is 0.05. The other parameter
is called \Rfunarg{minRD} and specifies the minimum read depth of the
alternative allele (defult value = 10). For each position of the specified 
region, the method 
computes a threshold that the SNPs should be exceed to be plotted. To compute
this threshold the \Rfunarg{minAAF} is traduced at the read count scale by
multplication of it with the total counts of the genomic position. Then, the
threshold is defined as the maximum value between \Rfunarg{minAAF} (in read
counts scale) and \Rfunarg{minRD}. To avoid these filters both should be set to
0. For instance, the call of the \Rfunction{plotRegion} avoiding these filters
is:
<<plotRegion,fig=FALSE, eval=FALSE, results=hide>>=
g<-plotRegion(myPanel, region=c(4500,6800), seqname="chr10", SNPs=TRUE, 
    minAAF = 0, minRD = 0, xlab="", title="gene7 amplicons",size=0.5)
x11(type="cairo")
g
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.35]{gene7Region.pdf}
    \end{center}
    \caption{Read counts profile for the gene7 genomic region.}
    \label{fig:regiongene7}
\end{figure}

Another method, \Rfunction{plotFeature}, allows exploring the read profile 
of a particular feature. In this case,it is only necessary to specify the 
feature name that should be explored. This method internally calls the 
\Rfunction{plotRegion}, thus it also could use the \Rfunarg{minAAF} and
\Rfunarg{minRD} parameters. For instance, the R code below illustrates the 
way to explore the ``AMPL20'' amplicon  of the ``gene7'', considering only
those genomics variations exceed \Rfunarg{minAAF} =0.05 and \Rfunarg{minRD}=10,
which are the default values of these parameters.\\
<<plotFeature, fig=FALSE, eval=FALSE, results=hide>>=
g<-plotFeature(myPanel, featureID="AMPL20")
x11(type="cairo")
g
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.35]{featuregene7.pdf}
    \end{center}
    \caption{Read counts profile for the "20" gene7 amplicon.}
    \label{fig:featgene7}
\end{figure}
\par
As can be seen in both Figure \ref{fig:regiongene7} and Figure 
\ref{fig:featgene7}, the gray shadow corresponds to the total counts that were 
obtained at each genomic position insight the selected amplicon. The violet 
line indicates read counts matching with the reference sequence. In order 
to distinguish how many read counts could correspond to a genomic variation, it
is crucial that the \Rfunarg{pileupP} definition contains the 
\Rfunarg{distinguish nucleotide} parameter as TRUE, which is its default 
value. In the previous plots, the green, blue, red and brown lines illustrate 
the read counts that do not match with the reference and inform about a 
possible nucleotide variation. In the Figure \ref{fig:featgene7} can be 
appreciated that the 
selected amplicon shows a variation changing the reference  nucleotide for a 
``T''. \\
The proportion of read counts that match and no match against the 
reference can be displayed using the \Rfunction{plotNtdPercentage} method as:
<<plotNtd, fig=FALSE, eval=FALSE, results=hide>>=
plotNtdPercentage(myPanel, featureID="AMPL20")
@
\begin{figure}[!h]
    \begin{center}
        \includegraphics[scale=0.35]{NTDPercAMPL20.pdf}
    \end{center}
    \caption{Nucleotide percentages for each genomic position on the 
            ``AMPL20'' gene7 amplicon.}
    \label{fig:ntd}
\end{figure}
\par 
In Figure \ref{fig:ntd} can be observed that between 4910 and 
4920 positions there is a nucleotide showing differences between reference and
read sequences. This information can be also extracted using the previous 
read counts \Rclass{data frame}, \Rfunarg{myCounts} and the 
\Rfunarg{featurePanel} slot of the \targetexperiment
object. To do this only the feature name is necessary:
<<featureInfo>>=
getFeaturePanel(myPanel)["AMPL20"]
@
Using this information, you could select a subset in the \Rfunarg{myCounts} 
object containing only those nucleotide rows corresponding to the feature:
<<readCountsFeat<>>=
featureCounts<-myCounts[myCounts[, "seqnames"] =="chr10" & 
                        myCounts[,"pos"] >= 4866 & myCounts[,"pos"] <= 4928,]
@
Then, the position having the lowest value in the ``='' column can be found. 
The position with the minimum value of read counts matching against the 
reference is the same position that has the higher variation:
<<readCountsFeat<>>=
featureCounts[which.min(featureCounts[,"="]),]
@
As can be observed, in the position 4912 the reference genome indicates that 
there should be a ``G'', and the read counts indicate that in this position 
there is a ``T'', showing a possible nucleotide variation.
Remember that the presented analysis is only an exploratory tool. Thus,
the possible nucleotide variations detected in this stage should be confirmed 
or rejected in more specific downstream analysis. In the same way, a variant 
identified in variant analysis can be easily explored and confirmed using 
our tool.
\subsection{Quality Control Report}
The \tarseq{} \R{} package provides a method that generates an .xlsx report in 
which \textit{Quality Control} relevant information is contained. This file has
three sheets. In the first, a summary is presented, containing the results of 
\Rfunction{summary} and \Rfunction{summaryIntervals} methods. This sheet 
also includes a plot characterizing the experiment. Any 
graphic could be chosen, but if its name is not specified, the method calls 
the \tarseq{} \Rfunction{plot} method to build it. The second and third sheets
store the panel information at a gene and a feature level respectively. Only
the information corresponding to the selected attribute will be stored. Then,
if only the report generation is desired, the \Rfunction{buildReport} method
can be called after the object construction. In the present example, the image 
file that should be included in the report and the report creation is
specified, using the code below:
<<buildReport, eval=FALSE>>=
imageFile<-system.file("extdata", "plot.pdf", package="TarSeqQC",
                        mustWork=TRUE)
buildReport(myPanel, attributeThres, imageFile ,file="Results.xlsx")
@
\section{The \targetexperimentlist{} class}
The \targetexperimentlist{} class was built in order to allow the joint 
analysis of several \targetexperiment{} objects. Figure 
\ref{fig:class} shows the \targetexperimentlist{} class structure.\\
\begin{figure}[!hbp]
    \begin{center}
        \includegraphics[scale=0.4]{TEList.pdf}
    \end{center}
    \caption{\targetexperimentlist{} class diagram.}
    \label{fig:class}
\end{figure}
\par 
The \targetexperimentlist{} class has four slots:
\begin{itemize}
    \item \Rfunarg{bedFile}: a \Rclass{GRanges} object that models the 
        \bedfile{}
    \item \Rfunarg{panels}: a \Rclass{GRanges} object that contains information 
        related to several feature panels and related statistics.
    \item \Rfunarg{attribute}: a \Rclass{character} indicating which attribute 
        \textit{coverage} or \textit{medianCounts} will be used to the 
        analysis.  
    \item \Rfunarg{feature}: a \Rclass{character} indicating the name of the 
        analyzed features, e.g.: ``amplicon'', ``exon'', ``transcript''.
\end{itemize}
\targetexperimentlist{} has implemented several methods that are an extension of
previously described methods for \targetexperiment{} class. The next sections 
will illustrate the employment of \targetexperimentlist{} methods using a
\targetexperimentlist{} object provided with the package. This dataset contains
information of two synthetic amplicon sequencing experiments using 29 
\textit{amplicons} of 8 genes in 4 chromosomes and sequenced using two PCR 
pools.\\
\subsection{Input Data}
The \targetexperimentlist{} allows the joint analysis of several TS experiments
performed using the same \bedfile{} that can be involved in several
situations such as a disease characterization study, patient monitoring or
system calibration process.\\
As a first step, one \targetexperiment{} object for each of the TS experiments
should be built. Then, a \Rclass{list} object, called \Rfunarg{panelList}, 
should be defined where each of its elements will be a \targetexperiment{} 
object. For instance, having defined two \targetexperiment{} objects, 
\Rfunarg{te1} and \Rfunarg{te2}, the \Rclass{list} must be defined as:
<<echo=TRUE,eval=FALSE>>=
panelList<-list(panel1=te1, panel2=te2)  
@ 
After that, the \targetexperimentlist{} constructor can be called passing three 
parameters: \Rfunarg{panelList}, \Rfunarg{feature} and \Rfunarg{attribute}. The
las two are the same previously used in panels definitions. It is worth 
highlighting that in this case \Rfunarg{attribute} cannot be specified after
object definition. When the constructor is called, the resultant object only 
conserves information of the selected attribute. For instance, if the
\Rfunarg{attribute} is equal to `coverage', the `medianCounts' information 
will not be stored in the \targetexperimentlist{} object. Thus, if the user 
wants to change the selected \Rfunarg{attribute}, the \targetexperimentlist{} 
constructor must be called again.
\subsection{Creating a \targetexperimentlist{} object}
\subsubsection{Constructor call}
\targetexperimentlist{} constructor can be now called using:
<<echo=TRUE,eval=FALSE>>=
TEList<-TargetExperimentList(TEList=ampliList, feature="amplicon", 
    attribute="coverage")
@
This object is provided with the \tarseq{} package and can be loaded doing:
<<echo=TRUE,eval=TRUE>>=
data(TEList, package = "TarSeqQC")
@
\subsubsection{Early exploration}
As \targetexperiment{}, the \targetexperimentlist{} class has also implemented
typical \R{} methods such as \Rfunction{show}, \Rfunction{print}, and 
\Rfunction{summary}. They can be used as:\\
<<echo=TRUE,eval=TRUE>>=
# show/print
TEList
# summary
summary(TEList)
@
The \Rfunction{summary} method returns a list where each element is a statistic
summary table computed in each feature panel. In particular, if the TS involves
several PCR primer pools, then the method will display statistics summaries 
for each panel/sample and for each pool as can be seen in the example presented
here. This basic exploration allows the identification of the panel 2 as a 
poor performance panel in wich the mean coverage was just 74 whereas, in panel
1, amplicons achieved a mean coverage of 316. It is also observed that 
amplicons in pool 1 achieved higher mean coverage in contrast of amplicons
in pool 2 in both panels.\\
A simple way to observe the described situation is provided by the previously 
used \Rfunction{plotAttrExpl} method but now in the \targetexperimentlist{} 
object. Now, the use of the method is:\\
<<eval=FALSE>>=
x11(type="cairo")
plotAttrExpl(TEList, dens=TRUE, join=FALSE, log=FALSE, pool=TRUE)
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.6]{attrPerfPool.pdf}
    \end{center}
    \caption{Attribute distribution and density plots for the two studied panels
    and separated by PCR pools.}
    \label{fig:attrExplList}
\end{figure}
The previous method provides several options allowing different configurations
of the displayed plot. The parameter \Rfunarg{dens} is a logical
that indicates if density plot should be included and, if it is set to 
\Rfunarg{TRUE}, then the \Rfunarg{join} flag indicates if density and box-plots
should be plot together or not. The \Rfunarg{pool} parameter is also a logical
and if it indicates PCR pool presence (\Rfunarg{pool} = \Rfunarg{TRUE}), then
the plot are built for each pool separately using the \Rpackage{ggplot2} 
facets facility. The last case is the showed in Figure \ref{fig:attrExplList}.\\
In order to allow the same exploration but at a pool level, 
\targetexperimentlist{} class implements the \Rfunction{plotPoolPerformance}
method. This has the same parameters specifications that the previous method 
excepting, obviously, the \Rfunarg{pool} flag.\\
<<eval=FALSE>>=
x11(type="cairo");
plotPoolPerformance(TEList, dens=TRUE, join=FALSE, log=FALSE)
@
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.6]{poolPerformDens.pdf}
    \end{center}
    \caption{Attribute distribution and density plots for the two PCR pools.}
    \label{fig:poolPerf}
\end{figure}
\subsection{Comparative analysis and Quality Control}
As in the case of the \targetexperiment{} class, attribute intervals such the 
listed in Table \ref{table:featureInterval} can be included in the panel 
comparative analysis. For instance, the two previous \targetexperimentlist{} 
methods could accept a parameter \Rfunarg{attributeThres} specifying the
attribute intervals. If this parameter is specified, then the plots will be 
colored according to the interval in which fall the corresponding median 
value. The two previous figures were built without using this parameter and
consequently, the plots shows one color for each sample. If in the corresponding
methods calling \Rfunarg{attributeThres} is added, then the resultant plots are
showed in Figure  \ref{fig:attrPerfThres} and Figure \ref{fig:poolPerfThres}.\\
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.6]{attrPerfPoolThres.pdf}
    \end{center}
    \caption{Attribute distribution and density plots for the two PCR pools.}
    \label{fig:attrPerfThres}
\end{figure}
\begin{figure}[!htp]
    \begin{center}
        \includegraphics[scale=0.5]{poolPerformThres.pdf}
    \end{center}
    \caption{Attribute distribution and density plots for the two PCR pools.}
    \label{fig:poolPerfThres}
\end{figure}
\par
The \Rmethod{plot} method was also implemented for \targetexperimentlist{}
class. In this case, the resultant \Rpackage{ggplot2} graph is a heatmap where
rows represent features, columns represent samples and each cell is colored
according to the attribute value for the feature in the corresponding sample.
The method also allows the incorporation of pool information using 
\Rpackage{ggplot2} facets facilities. In the example case the follow lines 
generates this plot:\\
<<eval=FALSE>>=
plot(TEList, attributeThres = attributeThres, sampleLabs = TRUE,
    featureLabs = TRUE)
@
\begin{figure}[!htp]
    \begin{center}
    \includegraphics[scale=0.7]{plotTEList.pdf}
    \end{center}
    \caption{Panels overview: amplicons in rows and samples in columns. Colors
        according to prefixed coverage intervals}
    \label{fig:plotTEList}
\end{figure}
The displayed heatmap confirms the behavior previously observed in 
\Rfunction{summary} results and in Figure \ref{fig:attrExplList} where panel 2
shows a boxplot with lower median compared with panel 1. In panel 2, more than
50 \% of the amplicons have a coverage 
lower than 50 related to \textit{low sequencing coverage} interval. Thus, if
downstream analysis involves nucleotide variant identification, this panel
should be discarded or re-sequenced in order to achieve higher coverage values
that provide more credibility in variant identification. \\
When \Rfunarg{featureLabs} is set to `TRUE', the \Rfunction{plot} method allows
the identification of features with low attribute value by simple inspection
of x-axis ticks. If user wants to know not only who are the features with low 
attribute value but also what attribute value achieved each of these, thus
\Rfunction{getLowCtsFeatures} can be used again as:\\
<<split=FALSE>>=
getLowCtsFeatures(TEList, level="feature", threshold=50)
@
The returned object is a \Rclass{data.frame} containing those features or genes
(depending on the value of the \Rfunarg{level} parameter) that have attribute 
values lower than a threshold in at least one of the analyzed samples. For 
instance, the first amplicon appearing is `AMPL4' having 0 coverage in both 
panels and the second amplicon is `AMPL1' having coverage lower than 50 only
in the second panel (0). \\
A more detailed view of the achieved attribute values per features is given for
the \Rfunction{plotGlobalAttrExpl} \targetexperimentlist{} method.
<<eval=FALSE>>=
x11(type="cairo")
plotGlobalAttrExpl(TEList, attributeThres=attributeThres, dens=FALSE,
    pool=TRUE, featureLabs=FALSE)
@
\begin{figure}[!htp]
    \begin{center}
    \includegraphics[scale=0.7]{GlobalAttrPool.pdf}
    \end{center}
    \caption{Amplicon performance. Each box-plot represents an amplicon, the
    values are the coverage achieved in the samples and its color is according
    to prefixed coverage intervals. Each facet represents a different PCR pool}
    \label{fig:global}
\end{figure}
\par
Figure \ref{fig:global} shows the performance for each feature, in terms of
achieved attribute, and allows the separation of them according to the PCR 
pool. Thus, the identification of poor performance pools is also allowed. In 
this case, both pool performances are similar and `AMPL4' and `AMPL7' show the
worst performance.\\
\section{Troubleshoot}
\Rfunction{TargetExperiment} constructor in older TarSeqQC versions could 
result in an error message if 
the analyzed BAM file contains at least one unmapped read. For this, the
BAM file section specify that the sorted alignment results should be used. 
Thus, unmapped reads need to be filtered before to theTarSeqQC use. To do this,
the user could use the \Rfunarg{scanBamP}
parameter in the \Rfunction{TargetExperiment} constructor. The 
\Rfunarg{scanBamP} is a \Rclass{ScanBamParam} object that has several
attributes. One of these is the \Rfunarg{flag} which is a \Rclass{scanBamFlag}
that specifies the typical samtools flags used to scan a BAM file. Among the
different options offered by the \Rfunction{scanBamFlag} constructor, the user
can find the \Rfunarg{isUnmappedQuery}. This is particularly useful to specify
if the unmapped reads should be considered when the BAM file is scanned. Thus, 
if the user's BAM file could have any unmapped reads, we suggest the use of this
parameter to ensure that those reads will not be considered when the
\Rfunction{TargetExperiment} constructor is called. For instance, if it is 
suspected that the example BAM file contains some unmapped reads, the \R{} code
to successfully call the \Rfunction{TargetExperiment} constructor :
<<filtering unmapped reads, eval=FALSE>>=
bedFile<-system.file("extdata", "mybed.bed", package="TarSeqQC", mustWork=TRUE)
fastaFile<-system.file("extdata", "myfasta.fa", package="TarSeqQC",
    mustWork=TRUE)
bamFile<-system.file("extdata", "mybam.bam", package="TarSeqQC", mustWork=TRUE)
flag<-scanBamFlag(isUnmappedQuery = FALSE)
scanBamP<-ScanBamParam(flag=flag)
myPanel<-TargetExperiment(bedFile, bamFile, fastaFile, feature="amplicon", 
                        attribute="coverage", scanBamP=scanBamP,BPPARAM=BPPARAM)

@
Remember that all \targetexperiment{} methods that need 
read count information at a nucleotide level work over the \bedfile, \bamfile{}
and the \fastafile. For this reason, in order to use some of them, please make 
sure that the corresponding \targetexperiment{} slots have the file names well 
defined. For example, if the \tarseq{} example data is loading as follows:
<<loading TarSeqQC example data>>=
data(ampliPanel, package="TarSeqQC")
ampliPanel
@
and you want to explore a feature using the \Rfunction{plotFeature} method, the
execution will cause an error because the method cannot find the files. \\
\begin{Schunk}
\begin{Soutput}
plotFeature(ampliPanel, featureID="AMPL1")
[1] "The index of your BAM file doesn't exist"
[1] "Building BAM file index"
open: No such file or directory
Error in FUN(X[[i]], ...) : failed to open SAM/BAM file
file: './mybam.bam'
\end{Soutput}
\end{Schunk}
To solve the previous error, the next code should be executed before:
<<using TarSeqQC example data, eval=FALSE>>=

setBamFile(ampliPanel)<-system.file("extdata", "mybam.bam", package="TarSeqQC",
                                    mustWork=TRUE)
setFastaFile(ampliPanel)<-system.file("extdata", "myfasta.fa", 
                                    package="TarSeqQC", mustWork=TRUE)
@
and then:
<<buildFeaturePanel using TarSeqQC example data, eval=FALSE>>=
plotFeature(ampliPanel, featureID="AMPL1")
@
\clearpage
\section*{Session Info}
<<Session Info, echo=true>>=
sessionInfo()
@
\clearpage
\bibliographystyle{apalike} 
\bibliography{TarSeqQCvignette}
\end{document}