%\VignetteEngine{knitr} %\VignetteIndexEntry{metaX tutorial} %\VignetteKeywords{Metabolomics, Quality Control,LC/GC-MS, Statistical analysis} %\VignettePackage{metaX} \documentclass[12pt]{article} <>= BiocStyle::latex() @ \bioctitle{A short tutorial on using \Biocpkg{metaX} for high-throughput mass spectrometry-based metabolomic data analysis} \author{Bo Wen} \begin{document} \maketitle \tableofcontents <>= suppressPackageStartupMessages(library("metaX")) #suppressPackageStartupMessages(library("R.utils")) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction}\label{sec:intro} The \Biocpkg{metaX} package provides a integrated pipeline for mass spectrometry-based metabolomic data analysis. It includes the stages peak detection, data preprocessing, normalization, missing value imputation, univariate statistical analysis, multivariate statistical analysis such as PCA and PLS-DA, metabolite identification, pathway analysis, power analysis, feature selection and modeling, data quality assessment and HTML-based report generation. This document describes how to use the function included in the R package \Biocpkg{metaX}. \section{Example data}\label{sec:data} We are going to use a dataset from the reference \cite{saghatelian2004assignment}. This data can be accessed through the \Biocpkg{faahKO} package. The samples in this data set can be divided into two groups (group knockout or KO, group wild type or WT) which each group includes six samples. \section{Using metaX}\label{sec:use} \subsection{Data import and parameter setting} The first step in the \Biocpkg{metaX} pipeline is the definition of a sample list file, that provides the file names (sample), batch number (batch), sample class (class) and the sample injection order (order). An example sample list file is shown below: <>= sampleListFile <- system.file("extdata/faahKO_sampleList.txt", package = "metaX") samList <- read.delim(sampleListFile) print(samList) @ Please note that if the sample list file contains quality control (QC) sample, the value in the column of class must be "NA". Except the sample list file, the user also needs to provide the MS data (mzML, mzXML or CDF format) or a peak list file which generated by \Biocpkg{XCMS}, MZmine \cite{katajamaa2006mzmine} or other software which can be used for peak picking. If the user provides MS data, \Biocpkg{metaX} uses the \Biocpkg{XCMS} to perform peak picking. In this situation, the MS data must be placed in two subdirectories of a single folder like below: <>= list.files(system.file("cdf", package = "faahKO"), recursive = TRUE,full.names = TRUE) @ In the \Biocpkg{metaX} package, it uses a metaXpara-class object to manage the file path information and other parameters for data processing. We can set the input files path like below: <>= ## create a metaXpara-class object #library("metaX") para <- new("metaXpara") ## set the MS data path dir.case(para) <- system.file("cdf/KO", package = "faahKO") dir.ctrl(para) <- system.file("cdf/WT", package = "faahKO") ## set the sample list file path sampleListFile(para) <- sampleListFile @ Usually, the user also needs to set several other parameters for data analysis: \begin{enumerate} \item Peak picking. If the user wants to use \Biocpkg{metaX} to do the peak picking, several parameters related to peak picking must be set. <>= ## set parameters for peak picking xcmsSet.peakwidth(para) <- c(20,50) xcmsSet.snthresh(para) <- 10 xcmsSet.prefilter(para) <- c(3,100) xcmsSet.noise(para) <- 0 xcmsSet.nSlaves(para) <- 4 @ For the complete parameters, please see the help page of metaXpara-class. \item Missing value imputation. Missing values is a common phenomenon in a typical quantitative metabolomics dataset. There are several methods provided by \Biocpkg{metaX} to process the missing value. Currently, we implemented a variety of methods which enable users to automatically perform missing value imputation by Probabilistic PCA (PPCA), Bayesian PCA (BPCA), k nearest-neighbor (KNN) missForest and Singular Value Decomposition Imputation (SVDImpute). <>= ## bpca, svdImpute, knn, rf missValueImputeMethod(para) <- "knn" @ \item normalization. Currently, we implemented several methods to perform data normalization, such as the QC-RLSC, sum, VSN, probabilistic quotient normalization (PQN), quantiles and robust quantiles. <>= ## bpca, svdImpute, knn, rf missValueImputeMethod(para) <- "knn" @ \item set the comparison groups. We can use the following method to set the comparison groups: <>= ## set the comparison groups ratioPairs(para) <- "KO:WT" @ If multiple comparison groups must be set in a single analysis, the user can set the "para@ratioPairs" like "A:B;C:B;D:B", each comparison group is separated by semicolon. \item output parameters. The user can set the output directory and the prefix of the output files as below: <>= ## set the output parameters outdir(para) <- "test" prefix(para) <- "metaX" @ \end{enumerate} \subsection{Run metaX} The function metaXpipe automates the whole data analysis process. <>= library(faahKO) xset <- group(faahko) xset <- retcor(xset) xset <- group(xset) xset <- fillPeaks(xset) peaksData <- as.data.frame(groupval(xset,"medret",value="into")) peaksData$name <- row.names(peaksData) rawPeaks(para) <- peaksData @ <>= plsdaPara <- new("plsDAPara") res <- metaXpipe(para = para,plsdaPara = plsdaPara, cvFilter = 0.2, remveOutlier = TRUE) @ After the analysis has completed, the file "index.html" in the output directory can be opened in a web browser to access report generated. %## set parameters for output %para@outdir <- "./" %para@prefix <- "test" % %## set comparison group %para@ratioPairs <- "KO:WT" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section*{Session information}\label{sec:sessionInfo} All software and respective versions used to produce this document are listed below. <>= toLatex(sessionInfo()) @ \bibliography{metaX} \end{document}