%\VignetteEngine{knitr}
%\VignetteIndexEntry{metaX tutorial}
%\VignetteKeywords{Metabolomics, Quality Control,LC/GC-MS, Statistical analysis}
%\VignettePackage{metaX}

\documentclass[12pt]{article}

<<style, eval=TRUE, echo=FALSE, results='asis'>>=
BiocStyle::latex()
@


\bioctitle{A short tutorial on using \Biocpkg{metaX} for high-throughput mass
  spectrometry-based metabolomic data analysis}

\author{Bo Wen}


\begin{document}

\maketitle

\tableofcontents

<<env, echo=FALSE,warning=FALSE,message=FALSE>>=
suppressPackageStartupMessages(library("metaX"))
#suppressPackageStartupMessages(library("R.utils"))
@


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Section
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Introduction}\label{sec:intro} 

The \Biocpkg{metaX} package provides a integrated pipeline for mass
spectrometry-based metabolomic data analysis. It includes the stages peak 
detection, data preprocessing, normalization, missing value imputation, 
univariate statistical analysis, multivariate statistical analysis such as 
PCA and PLS-DA, metabolite identification, pathway analysis, power analysis, 
feature selection and modeling, data quality assessment and 
HTML-based report generation. This document describes how to use the function 
included in the R package \Biocpkg{metaX}.


\section{Example data}\label{sec:data} 
We are going to use a dataset from the reference \cite{saghatelian2004assignment}. 
This data can be accessed through the \Biocpkg{faahKO} package. The samples in 
this data set can be divided into two groups (group knockout or KO, group wild 
type or WT) which each group includes six samples.


\section{Using metaX}\label{sec:use}

\subsection{Data import and parameter setting}
The first step in the \Biocpkg{metaX} pipeline is the definition of a sample 
list file, that provides the file names (sample), batch number (batch), sample 
class (class) and the sample injection order (order). An example sample list 
file is shown below:

<<samList, eval=TRUE>>=
sampleListFile <- system.file("extdata/faahKO_sampleList.txt", 
                              package = "metaX")
samList <- read.delim(sampleListFile)
print(samList)
@

Please note that if the sample list file contains quality control (QC) sample, 
the value in the column of class must be "NA". Except the sample list file, the 
user also needs to provide the MS data (mzML, mzXML or CDF format) or a peak 
list file which generated by \Biocpkg{XCMS}, MZmine \cite{katajamaa2006mzmine} 
or other software which can be used for peak picking. If the user provides MS 
data, \Biocpkg{metaX} uses the \Biocpkg{XCMS} to perform peak picking. In this 
situation, the MS data must be placed in two subdirectories of a single folder 
like below:

<<exampleDir, eval=TRUE>>=
list.files(system.file("cdf", package = "faahKO"),
           recursive = TRUE,full.names = TRUE)
@

In the \Biocpkg{metaX} package, it uses a metaXpara-class object to manage the 
file path information and other parameters for data processing. We can set the 
input files path like below:

<<msFilePath, eval=TRUE>>=
## create a metaXpara-class object
#library("metaX")
para <- new("metaXpara")
## set the MS data path
dir.case(para) <- system.file("cdf/KO", package = "faahKO")
dir.ctrl(para) <- system.file("cdf/WT", package = "faahKO")

## set the sample list file path
sampleListFile(para) <- sampleListFile
@

Usually, the user also needs to set several other parameters for data analysis:

\begin{enumerate}
\item Peak picking. If the user wants to use \Biocpkg{metaX} to do the peak 
picking, several parameters related to peak picking must be set.

<<setParameter,eval=TRUE>>=
## set parameters for peak picking
xcmsSet.peakwidth(para) <- c(20,50)
xcmsSet.snthresh(para) <- 10
xcmsSet.prefilter(para) <- c(3,100)
xcmsSet.noise(para) <- 0
xcmsSet.nSlaves(para) <- 4
@
For the complete parameters, please see the help page of metaXpara-class. 

\item Missing value imputation. Missing values is a common phenomenon in a 
typical quantitative metabolomics dataset. There are several methods provided by
\Biocpkg{metaX} to process the missing value. Currently, we implemented a 
variety of methods which enable users to automatically perform missing value 
imputation by Probabilistic PCA (PPCA), Bayesian PCA (BPCA), k nearest-neighbor 
(KNN) missForest and Singular Value Decomposition Imputation (SVDImpute).

<<missValueImputeMethod1,eval=TRUE>>=
## bpca, svdImpute, knn, rf
missValueImputeMethod(para) <- "knn"
@

\item normalization. Currently, we implemented several methods to perform data 
normalization, such as the QC-RLSC, sum, VSN, probabilistic quotient 
normalization (PQN), quantiles and robust quantiles. 

<<missValueImputeMethod2,eval=TRUE>>=
## bpca, svdImpute, knn, rf
missValueImputeMethod(para) <- "knn"
@

\item set the comparison groups. We can use the following method to set the 
comparison groups:

<<ratioPairs,eval=TRUE>>=
## set the comparison groups
ratioPairs(para) <- "KO:WT"
@

If multiple comparison groups must be set in a single analysis, the user can set
the "para@ratioPairs" like "A:B;C:B;D:B", each comparison group is separated by 
semicolon.


\item output parameters. The user can set the output directory and the prefix of 
the output files as below:

<<output,eval=TRUE>>=
## set the output parameters
outdir(para) <- "test"
prefix(para) <- "metaX"
@


\end{enumerate}


\subsection{Run metaX}
The function metaXpipe automates the whole data analysis process. 

<<rawPeaks,eval=TRUE,echo=FALSE,results='hide',warning=FALSE,message=FALSE>>=
library(faahKO)
xset <- group(faahko)
xset <- retcor(xset)
xset <- group(xset)
xset <- fillPeaks(xset)
peaksData <- as.data.frame(groupval(xset,"medret",value="into"))
peaksData$name <- row.names(peaksData)
rawPeaks(para) <- peaksData
@

<<metaXpipe,eval=FALSE,warning=FALSE,message=FALSE>>=
plsdaPara <- new("plsDAPara")
res <- metaXpipe(para = para,plsdaPara = plsdaPara, 
               cvFilter = 0.2, remveOutlier = TRUE)
@


After the analysis has completed, the file "index.html" in the output directory 
can be opened in a web browser to access report generated.


%## set parameters for output
%para@outdir <- "./"
%para@prefix <- "test"
%
%## set comparison group
%para@ratioPairs <- "KO:WT"



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Section
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section*{Session information}\label{sec:sessionInfo} 

All software and respective versions used to produce this document are listed below.

<<sessioninfo, results='asis', echo=FALSE>>=
toLatex(sessionInfo())
@

\bibliography{metaX}

\end{document}