\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{color}
\usepackage{graphicx}
\usepackage{geometry}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage[absolute,verbose,overlay]{textpos}
\usepackage{amssymb}
\usepackage{tikz} 
\usepackage{colortbl}
\usepackage[figurewithin=section]{caption}
\usepackage{amsmath}
\usepackage{listings}
\usepackage{booktabs}
\usepackage{xr}
\usepackage{pdflscape}
%\VignetteIndexEntry{Using BBCAnalyzer}
\begin{document}
\SweaveOpts{concordance=TRUE}
\begin{center}
\large BBCAnalyzer

an R/Bioconductor package for visualizing base counts
\bigskip

\normalsize
Sarah Sandmann

June 10th, 2015
\end{center}
\bigskip

\tableofcontents

\section{Introduction}
The analysis of alignment data is an essential step in the process of analyzing sequencing data. Mutations like single nucleotide base changes, deletions and insertions may be identified in this step.

Usually, the available programs -- like GATK \cite{gatk} or SAMtools \cite{samtools} -- take a bam file, containing the alignment data in a compressed form, as input and return a vcf file containing the called variants and a selection of parameters characterizing them. The overall depth as well as the allele frequencies are returned as two of these parameters.

However, the programs do usually not report the unfiltered number of reads covering a certain mutated base, but they apply a set of partially complex filtration steps. Thresholds -- internally or user-defined -- are usually used to reject additional variants with a minor ratio. Furthermore, to our knowledge it is not possible to perform a comparable analysis of positions where no variant is called.

The integrative genomics viewer (IGV) \cite{igv} provides a possibility for investigating the different number of bases, deletions and insertions at any position in the genome. Yet, it is time consuming to load different samples into the program and to look at the different positions of interest. Moreover, there is no way of automatically summing up and visualizing the base counts in IGV. 

With regards to the comparison of different sequencing techniques, it appears useful to have a tool that is able to visualize the background at a selection of locations where e.g. one technique calls a variant but another technique does not. Furthermore, it seems helpful to have a possibility for quickly analyzing positions of expected mutations, which have not been called.

This package provides a possibility for visualizing the number of counted bases, deletions and insertions at any given position in any genome in comparison to the reference bases. Additionally,
\medskip

\begin{itemize}
\item markers for the relative base frequencies, 
\item the mean qualities of the detected bases,
\item known mutations or polymorphisms (e.g. based on dbSNP \cite{dbsnp}) and
\item called variants in the data
\end{itemize}
\medskip

may equally be included into the plots.

\subsection{Loading the package}
The package can be downloaded and installed with
\medskip

\texttt{$>$ biocLite("BBCAnalyzer")}.


After installation, the package can be loaded into R by typing
\medskip

\texttt{$>$ library(BBCAnalyzer)}
\medskip

into the R console.

\emph{BBCAnalyzer} requires the R-packages \emph{VariantAnnotation}, \emph{Rsamtools} and \emph{BSgenome.Hsapiens.UCSC.hg19}. All of them are loaded automatically when using \emph{BBCAnalyzer}.

\section{Analyzing the bases and visualizing the results}
\emph{BBCAnalyzer} performs an analysis of the bases, deletions and inserts at defined positions in sequence alignment data and visualizes the results. The process consists of different steps. These steps are highly dependent on each others output, which is why it is necessary to run the function \texttt{analyzeBases}, performing the whole process of analyzing the data and visualizing the results, at first. If any changes concerning the plotting of the results are desired, the function \texttt{analyzeBasesPlotOnly} may be used, taking amongst others the return value of \texttt{analyzeBases} as an obligatory input.

\subsection{Preparation}
For the correct functioning of \emph{BBCAnalyzer} in the first place, various input files are necessary:
\medskip

\begin{itemize}
\item The names of the samples to be analyzed have to be provided by a file. There has to be one sample name per line without the ``.bam''-suffix.
\end{itemize}
\medskip

\noindent Example file \emph{SampleNames.txt}:
\begin{lstlisting}
Example_IonTorrent
Example_454
\end{lstlisting}
\medskip

\begin{itemize}
\item The bam- and the corresponding bai files of the samples to be analyzed have to be provided in a folder. The names of the files have to match the names provided by the sample names file.
\end{itemize}
\medskip

\noindent Example files (\emph{Example\_IonTorrent.bam}, \emph{Example\_IonTorrent.bai}, \emph{Example\_454.bam} and \emph{Example\_454.bai}).
\medskip

\begin{itemize}
\item The target regions have to be provided by a file. The file may either contain regions (chromosome, tab, first base of a region, tab, last base of a region) or positions (chromosome, tab, position). A mixture of both is not supported. Yet, a region may cover only one base, i.e. the first and last base of a region may be identical.
\end{itemize}
\medskip

\noindent Example file \emph{targetRegions.txt}:
\begin{lstlisting}
4   106157698
20  31024054
21	36206893
2	25469913
21	44514970
7	148543694
7	148543695
\end{lstlisting}
\bigskip

\noindent Additionally to the necessary input, optional input may be defined:

\begin{itemize}
\item If vcf files shall be considered, the corresponding files of the samples to be analyzed have to be provided in a folder. There has to be one file per sample. The names of the files have to match the names provided by the sample names file.
\end{itemize}
\medskip

\noindent Selected parts of the file \emph{Example\_IonTorrent.vcf}:
\begin{lstlisting}
2	25469913	rs2276599	C	T	376.77	DP=38;	
GT:AD:DP:GQ:PL	0/1:18,20:38:99:405,0,340
4	106157698	.	T	C	1519.77	DP=216;
GT:AD:DP:GQ:PL	0/1:125,91:216:99:1548,0,2554
21	44514969	.	A	ATG	1961.73	DP=190;
GT:AD:DP:GQ:PL	0/1:84,92:176:99:1999,0,1942
\end{lstlisting}
\medskip 

\noindent Selected parts of the file \emph{Example\_454.vcf}:
\begin{lstlisting}
2	25469913	rs2276599	C	T	964.77	DP=60;
GT:AD:DP:GQ:PL	0/1:24,35:59:99:993,0,590
4	106157698	.	T	C	3066.77	DP=425;
GT:AD:DP:GQ:PL	0/1:251,143:394:99:3095,0,7734
7	148543693	.	TAA	T,TA	2517.73	DP=127;
GT:AD:DP:GQ:PL	1/2:4,94,28:126:99:2555,374,232,1047,0,788
20	31024054	.	A	G	244.77	DP=96;
GT:AD:DP:GQ:PL	0/1:74,22:96:99:273,0,1940
21	36206893	.	G	A	162.77	DP=249;
GT:AD:DP:GQ:PL	0/1:214,34:248:99:191,0,5561
\end{lstlisting}
\medskip

\begin{itemize}
\item A tabix file containing e.g. known polymorphisms or mutations (dbSNP \cite{dbsnp}) may be provided.
\end{itemize}

\subsection{Analyzing the bases}
The analysis of the bases, which actually only serves as a preparation for the final plots, is performed in different steps:
\bigskip

1) Determine target

\begin{itemize}
\item If the target regions file contains regions to be analyzed, the different positions covered by the regions are determined.
\item If the file already contains single positions, the program directly proceeds with the next step.
\item It is not necessary that the regions or positions are ordered.
\item If a known insert is supposed to be analyzed, the position of the base succeeding the insert has to be given.
\end{itemize}
\medskip

2) Analyze reads

\begin{itemize}
\item The reads at every targeted position get analyzed. By the help of the CIGAR string the bases, deletions and inserts are determined. The output is saved as $[$Sample$]$.bases.txt.
\item For every base -- also the inserted ones -- the base quality is determined (``-1'' in case of a deletion). The output is saved as $[$Sample$]$.quality.txt.
\item Reads with a mapping quality below a user defined mapping quality threshold get excluded from the analysis. Instead of a base, ``MQ'' is noted in the base file. Instead of a quality value, ``-2'' is noted in the quality file.
\item The function copes with uncovered positions (``NotCovered'' in the base- and the quality file) and insertions $>$1bp (repeated analysis of the position).
\end{itemize}
\medskip

3) Analyze frequency

\begin{itemize}
\item The number of detected bases, deletions and inserts at every position is summed up. Additionally, the mean quality of the detected bases -- including the inserts and for the inserted bases only -- is calculated.
\item Bases with a quality below a user defined base quality threshold are excluded and counted separately. 
\item The output is saved as $[$Sample$]$.frequency.txt. 
\item If the analysis shall consider vcf files as well, the alternate alleles and the genotypes -- as far as they are available for the positions analyzed -- are written out as well. Furthermore, for every called variant it is noted whether it is an insert.
\item The function copes with up to two different alternate alleles per position. 
\end{itemize}
\medskip

4) Report variants

\begin{itemize}
\item The ratios of the detected bases, deletions and insertions (additionally) at every position are determined. According to the determined ratios, up to six different calls get reported. If the user defines a frequency threshold, minor variants with ratios below this threshold do not get reported. The output is saved as $[$Sample$]$.calling.txt.
\item The function copes with insertions $>$1bp even if the position at which an insert is detected is not covered by all samples being analyzed.
\item If the analysis shall consider vcf files as well, the call -- taking the reference allele, the alternate allele(s) and the genotype into account -- is written out as well. For every called variant it is noted whether it is an insert and whether the genotype is heterozygous.
\end{itemize}
\medskip

A list of lists is returned containing all the information that is generated in this analysis step. The list with the bases ($[$Sample$]$.bases.txt) is available as the first element of the list. The corresponding qualities ($[$Sample$]$.qualities.txt) are available as the second element of the list. The summed up number of bases ($[$Sample$]$.frequency.txt) are available as the third element of the list. The calculated relative frequencies and the potential calls ($[$Sample$]$.calling.txt) are available as the fourth element.
\bigskip

\noindent First lines of the output file \emph{Example\_IonTorrent.bases.txt} (/first list of the first list):
\begin{lstlisting}
4;106157698	20;31024054	21;36206893	2;25469913	
21;44514970	21;44514970.1	7;148543694	7;148543695
C	A	G	T	T	T	A	A
T	A	G	C	T	T	A	A
T	A	G	C	T	T	A	A
T	A	A	C	T	T	A	A
T	A	G	C	I, T	I, G	A	A
T	A	G	C	T	T	A	A
T	A	G	T	T	T	A	A
C	A	G	T	T	T	A	A
T	A	G	T	T	T	A	A
T	A	G	C	I, T	I, G	A	A
\end{lstlisting}

\noindent First lines of the output file \emph{Example\_IonTorrent.quality.txt} (/first list of the second list):
\begin{lstlisting}
4;106157698	20;31024054	21;36206893	2;25469913	
21;44514970	21;44514970.1	7;148543694	7;148543695
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
66	66	66	66	66	66	66	66
\end{lstlisting}

\noindent First lines of the output file \emph{Example\_454.bases.txt} (/second list of the first list):
\begin{lstlisting}
4;106157698	20;31024054	21;36206893	2;25469913	
21;44514970	7;148543694	7;148543695
T	A	G	T	NotCovered	D	D
T	G	G	T	NA	D	A
C	G	G	C	NA	D	A
T	G	G	C	NA	D	D
C	A	G	C	NA	D	D
T	G	G	C	NA	D	D
C	A	G	C	NA	D	A
T	G	G	C	NA	D	D
T	A	G	C	NA	D	A
T	A	G	C	NA	D	A
\end{lstlisting}

\noindent First lines of the output file \emph{Example\_454.quality.txt} (/second list of the second list):
\begin{lstlisting}
4;106157698	20;31024054	21;36206893	2;25469913	
21;44514970	7;148543694	7;148543695
66	59	66	66	NotCovered	-1	-1
66	58	66	66	NA	-1	66
66	58	66	66	NA	-1	66
66	58	66	66	NA	-1	-1
66	56	66	66	NA	-1	-1
66	58	66	66	NA	-1	-1
66	58	66	66	NA	-1	66
66	54	66	66	NA	-1	-1
66	59	66	66	NA	-1	66
66	59	66	66	NA	-1	66
\end{lstlisting}

\begin{landscape}
\noindent Output file \emph{Example\_IonTorrent.frequency.txt} (mean qualities reduced to three decimal places) (/first list of the third list):
\begin{lstlisting}
chr	pos	ref	a	a_qual	c	c_qual	g	g_qual	t	t_qual	Del	Ins	
Ins_a	Ins_a_qual	Ins_c	Ins_c_qual	Ins_g	Ins_g_qual	Ins_t	Ins_t_qual	excluded
chr4	106157698	T	NA	NA	128	61	NA	NA	164	61.25	2	NA	
NA	NA	NA	NA	NA	NA	NA	NA	50
chr20	31024054	A	760	60.496	NA	NA	NA	NA	NA	NA	NA	NA	
NA	NA	NA	NA	NA	NA	NA	NA	25
chr21	36206893	G	50	64.24	NA	NA	282	62.982NA	NA	NA	NA	
NA	NA	NA	NA	NA	NA	NA	NA	32
chr2	25469913	C	NA	NA	274	64.818	NA	NA	83	64.301	NA	NA	
NA	NA	NA	NA	NA	NA	NA	NA	18
chr21	44514970	T	NA	NA	NA	NA	NA	NA	268	62.149	1	122	
NA	NA	NA	NA	NA	NA	122	62.377	48
chr21	44514970	T	NA	NA	NA	NA	121	61.669	61.959	1	121	
NA	NA	NA	NA	121	61.669	NA	NA	49
chr7	148543694	A	303	64.419	1	58	NA	NA	3	55.333	26	1	
NA	NA	1	58	NA	NA	NA	NA	55
chr7	148543695	A	331	63.912	NA	NA	NA	NA	NA	NA	7	NA	
NA	NA	NA	NA	NA	NA	NA	NA	49
\end{lstlisting}
\end{landscape}

\begin{landscape}
\noindent Output file \emph{Example\_IonTorrent.frequency.txt} (mean qualities reduced to three decimal places) (/second list of the third list):
\begin{lstlisting}
chr	pos	ref	a	a_qual	c	c_qual	g	g_qual	t	t_qual	Del	Ins	
Ins_a	Ins_a_qual	Ins_c	Ins_c_qual	Ins_g	Ins_g_qual	Ins_t	Ins_t_qual	excluded
chr4	106157698	T	NA	NA	383	60.687	NA	NA	580	61.445	NA	18	
NA	NA	18	56.278	NA	NA	NA	NA	70
chr20	31024054	A	181	63.105	NA	NA	70	63.2	NA	NA	NA	NA	
NA	NA	NA	NA	NA	NA	NA	NA	NA
chr21	36206893	G	121	62.785	NA	NA	646	58.661	NA	NA	22	NA	
NA	NA	NA	NA	NA	NA	NA	NA	NA
chr2	25469913	C	NA	NA	109	64.450	NA	NA	115	64.687	NA	NA	
NA	NA	NA	NA	NA	NA	NA	NA	1
chr21	44514970	T	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	
NA	NA	NA	NA	NA	NA	NA	NA	NA
chr7	148543694	A	16	58.4375	NA	NA	NA	NA	NA	NA	439	1	
1	53	NA	NA	NA	NA	NA	NA	NA
chr7	148543695	A	123	58.561	NA	NA	NA	NA	NA	NA	332	NA	
NA	NA	NA	NA	NA	NA	NA	NA	NA
\end{lstlisting}
\end{landscape}
\newpage

\begin{landscape}
\noindent Output file \emph{Example\_IonTorrent.calling.txt} (mean qualities reduced to three decimal places) (/first list of the fourth list):
\begin{lstlisting}
chr	pos	ref	a_freq	c_freq	g_freq	t_freq	del_freq	ins_freq	call1	call2	
call3	call4	call5	call6
chr4	106157698	T	NA	0.435	NA	0.558	0.007	NA	T	C	
NA	NA	NA	NA
chr20	31024054	A	1	NA	NA	NA	NA	NA	A	NA	
NA	NA	NA	NA
chr21	36206893	G	0.151	NA	0.850	NA	NA	NA	G	A	
NA	NA	NA	NA
chr2	25469913	C	NA	0.768	NA	0.232	NA	NA	C	T	
NA	NA	NA	NA
chr21	44514970	T	NA	NA	NA	0.996	0.004	0.454	T	Ins	
NA	NA	NA	NA
chr21	44514970	T	NA	NA	0.451	0.545	0.004	0.451	T	G	
Ins	NA	NA	NA
chr7	148543694	A	0.910	0.003	NA	0.009	0.078	0.003	A	Del	
NA	NA	NA	NA
chr7	148543695	A	0.979	NA	NA	NA	0.021	NA	A	Del	
NA	NA	NA	NA
\end{lstlisting}
\end{landscape}
\newpage

\begin{landscape}
\noindent Output file \emph{Example\_454.calling.txt} (mean qualities reduced to three decimal places) (/second list of the fourth list):
\begin{lstlisting}
chr	pos	ref	a_freq	c_freq	g_freq	t_freq	del_freq	ins_freq	call1	call2	
call3	call4	call5	call6
chr4	106157698	T	NA	0.398	NA	0.6023	NA	0.019	T	C	
Ins	NA	NA	NA
chr20	31024054	A	0.721	NA	0.279	NA	NA	NA	A	G	
NA	NA	NA	NA
chr21	36206893	G	0.153	NA	0.819	NA	0.028	NA	G	A	
Del	NA	NA	NA
chr2	25469913	C	NA	0.487	NA	0.513	NA	NA	T	C	
NA	NA	NA	NA
chr21	44514970	T	NA	NA	NA	NA	NA	NA	NA	NA	
NA	NA	NA	NA
chr21	44514970	T	NA	NA	NA	NA	NA	NA	NA	NA	
NA	NA	NA	NA
chr7	148543694	A	0.035	NA	NA	NA	0.965	0.002	Del	A	
NA	NA	NA	NA
chr7	148543695	A	0.270	NA	NA	NA	0.730	NA	Del	A	
NA	NA	NA	NA
\end{lstlisting}
\end{landscape}

\subsection{Visualizing the results}
The function -- which may be called separately as well using the function \texttt{analyzeBasesPlotOnly} -- provides different possibilities for visualizing the results:

\begin{itemize}
\item The absolute or the relative number of detected bases, deletions and inserts may be plotted
\item One plot per sample or one plot per targeted position may be output. The files are either saved as $[$Sample$]$.png or chr$[$number$]$;$[$position$]$.png.
\end{itemize}
\medskip

\noindent Apart from these differences, all plots share some general characteristics:

\begin{itemize}
\item The function always produces barplots.
\item The bars are colored according to the base at the corresponding position (adenine: green; cytosine: blue; guanine: yellow; thymine: red; deletion: black; insertion: lilac edge).
\item The color may additionally contain information on the mean quality of the base if bounds for the lower- and upper mean quality are set. In this case, the color of the bars is darker if the mean quality of the base is closer compared to the upper quality bound. The color of the bar is lighter if the mean quality of the base is closer compared to the lower quality bound.
\item The reference bases are plotted on the negative y-axis below each position.
\item A tabix file containing known variants or mutations may be provided. In this case, more than one reference base is plotted at the corresponding position.
\item Every targeted position is labelled according to the chromosome and the position.
\item For each position to be analyzed, lines may be drawn representing relative frequencies (defined by user).
\item The function copes with different inserted bases at one position (stacked bars) and inserts$>$1bp, even if these are not covered by all samples.
\item If the analysis shall consider vcf files as well, the expected number of detected bases, deletions and inserts -- according to the vcf file -- is added to the plot using dashed lines.
\end{itemize}
\bigskip

Exemplary call:

\begin{lstlisting}
##Example 1:
#No vcf files, use known polymorphisms (dbSNP), mark 1% 
threshold, plot relative frequencies, plot per sample.
\end{lstlisting}

<<Example1>>=
library("BSgenome.Hsapiens.UCSC.hg19")
ref_genome<-BSgenome.Hsapiens.UCSC.hg19
library("BBCAnalyzer")

output<-analyzeBases(sample_names=system.file("extdata","SampleNames.txt",package="BBCAnalyzer"),
            bam_input=system.file("extdata",package="BBCAnalyzer"),
            target_regions=system.file("extdata","targetRegions.txt",package="BBCAnalyzer"),
            vcf_input="",
            output=system.file("extdata",package="BBCAnalyzer"),
            output_pictures=system.file("extdata",package="BBCAnalyzer"),
            known_file=system.file("extdata","dbsnp_138.part.vcf.gz",package="BBCAnalyzer"),
            genome=ref_genome,
            MQ_threshold=60,
            BQ_threshold=50,
            frequency_threshold=0.01,
            qual_lower_bound=58,
            qual_upper_bound=63,
            marks=c(0.01),
            relative=TRUE,
            per_sample=TRUE)
@
\bigskip

Exemplary output:
\begin{figure}[ht!]
\includegraphics[width=15cm]{./Example_IonTorrent.png}
\caption{Output file Example\_IonTorrent.png: Validated SNV at chr4;106157698 (only little number of deleted bases), reference base at chr20;31024054, possible SNV at chr21;36206893, known SNP at chr2;25469913; inserted TG at chr21;44514970, possible deletion at chr7;148543694, possible deletion at chr7;148543695; marked 1\% threshold.}
\end{figure}

\begin{figure}[ht!]
\includegraphics[width=15cm]{./Example_454.png}
\caption{Output file Example\_454.png: Validated SNV at chr4;106157698 (only a low frequency of inserted A's), likely SNV at chr20;31024054 (high quality of alternate allele G), possible SNV at chr21;36206893 (low quality and frequency of alternate allele A), known SNP at chr2;25469913; not covered position at chr21;44514970, likely deletion at chr7;148543694, likely deletion at chr7;148543695; marked 1\% threshold.}
\end{figure}
\newpage

Exemplary call:

\begin{lstlisting}
##Example 2:
#Use vcf files, use known polymorphisms (dbSNP), mark 20%
, 40%, 60%, 80% and 100%, plot number of reads, plot 
per target.
\end{lstlisting}

<<Example2>>=
library("BSgenome.Hsapiens.UCSC.hg19")
ref_genome<-BSgenome.Hsapiens.UCSC.hg19
library("BBCAnalyzer")

output<-analyzeBases(sample_names=system.file("extdata","SampleNames.txt",package="BBCAnalyzer"),
            bam_input=system.file("extdata",package="BBCAnalyzer"),
            target_regions=system.file("extdata","targetRegions2.txt",package="BBCAnalyzer"),
            vcf_input=system.file("extdata",package="BBCAnalyzer"),
            output=system.file("extdata",package="BBCAnalyzer"),
            output_pictures=system.file("extdata",package="BBCAnalyzer"),
            known_file=system.file("extdata","dbsnp_138.part.vcf.gz",package="BBCAnalyzer"),            
            genome=ref_genome,
            MQ_threshold=60,
            BQ_threshold=50,
            frequency_threshold=0.01,
            qual_lower_bound=58,
            qual_upper_bound=63,
            marks=c(0.2,0.4,0.6,0.8,1),
            relative=FALSE,
            per_sample=FALSE)
@


Exemplary output:

\begin{figure}[ht!]
\includegraphics[width=15cm]{./chr21;36206893.png}
\caption{Output file chr21;36206893.png: possible SNV according to Example\_IonTorrent (not called, homozygous); likely SNV according to Example\_454 (called, heterozygous, great deviation from expected number of reads); marked 20\%, 40\%, 60\%, 80\% and 100\%.}
\end{figure}

\begin{figure}[ht!]
\includegraphics[width=15cm]{./chr7;148543695.png}
\caption{Output file chr7;148543695.png: unlikely deletion according to Example\_IonTorrent (not called, homozygous); likely deletion according to Example\_454 (called, heterozygous, deviation from expected number of reads); marked 20\%, 40\%, 60\%, 80\% and 100\%.}
\end{figure}

\begin{figure}[ht!]
\includegraphics[width=15cm]{./chr21;44514970.png}
\caption{Output file chr21;44514970.png: likely insertion according to Example\_IonTorrent (called, heterozygous, small deviation from expected number of reads); not covered position in Example\_454; marked 20\%, 40\%, 60\%, 80\% and 100\%.}
\end{figure}
\newpage

\section{Runtime}
The runtime of \emph{BBCAnalyzer} depends on the number of samples and the number of positions that get analyzed. Furthermore, the number of reads at the positions being analyzed influences the runtime as well.

The following test case analyzes one position for one sample. 344 reads cover the position being analyzed. A vcf file and a file containing known polymorphisms is not considered (vcf file: increases the runtime in this example by 1.625 seconds; file containing known polymorphisms: increases runtime in this example by 2.273 seconds; both: increases runtime in this example by 2.690 seconds). 
\bigskip

\noindent Test file \emph{SampleNames\_small.txt}:
\begin{lstlisting}
Example_IonTorrent
\end{lstlisting}
\medskip

\noindent Test files \emph{Example\_IonTorrent\_small.bam} and \emph{Example\_IonTorrent\_small.bai}.
\medskip

\noindent Test file \emph{targetRegions\_small.txt}:
\begin{lstlisting}
4	106157698
\end{lstlisting}

One barplot per sample is created. The relative frequencies are plotted. The whole pipeline (\texttt{analyzeBases}) consumes 40.496 seconds. The plotting itself (\texttt{analyzeBasesPlotOnly}) consumes 0.109 seconds.
\bigskip

Exemplary call:

\begin{lstlisting}
##Example 3:
#No vcf files, no file containing known polymorphisms, 
mark 1% threshold, plot relative frequencies, plot per sample.
\end{lstlisting}

<<Example3>>=
library("BSgenome.Hsapiens.UCSC.hg19")
ref_genome<-BSgenome.Hsapiens.UCSC.hg19
library("BBCAnalyzer")

output<-analyzeBases(
            sample_names=system.file("extdata","SampleNames_small.txt",package="BBCAnalyzer"),
            bam_input=system.file("extdata",package="BBCAnalyzer"),
            target_regions=system.file("extdata","targetRegions_small.txt",package="BBCAnalyzer"),
            vcf_input="",
            output=system.file("extdata",package="BBCAnalyzer"),
            output_pictures=system.file("extdata",package="BBCAnalyzer"),
            known_file="", 
            genome=ref_genome,
            MQ_threshold=60,
            BQ_threshold=50,
            frequency_threshold=0.01,
            qual_lower_bound=58,
            qual_upper_bound=63,
            marks=c(0.01),
            relative=TRUE,
            per_sample=TRUE)
@
\bigskip

Exemplary output:
\begin{figure}[ht!]
\includegraphics[width=15cm]{./Example_IonTorrent_small.png}
\caption{Output file Example\_IonTorrent\_small.png: Validated SNV at chr4;106157698, relative frequencies suggest heterozygous mutation, relatively low mean quality of detected cytosines and adenines ; marked 1\% threshold.}
\end{figure}
\newpage

\bibliographystyle{plain}
\bibliography{bibpaper}

\end{document}