%\VignetteIndexEntry{Molecule Identification with CAMERA} %\VignetteKeywords{CAMERA}\\ %\VignettePackage{CAMERA}\\ \documentclass[a4paper,12pt]{article} \usepackage{hyperref} \usepackage[table]{xcolor} \setlength{\parindent}{0cm} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}} \newcommand{\denovo}{{\em de-Novo{}}} \usepackage{varioref} \labelformat{figure}{\figurename~#1} \labelformat{table}{\tablename~#1} \begin{document} \title{LC-MS Peak Annotation and Identification with \Rpackage{CAMERA}} \author{Carsten Kuhl, Ralf Tautenhahn and Steffen Neumann} \maketitle \section{Introduction} %{{{ The R-package \Rpackage{CAMERA} is a {\bf C}ollection of {\bf A}lgorithms for {\bf ME}tabolite p{\bf R}ofile {\bf A}nnotation. It's primarily used for annotation of LC-MS data. Therefore it interacts directly with processed peak data from \Rpackage{xcms}. It includes algorithms for annotation of isotope peaks, adducts and fragments in peak lists generated by \Rpackage{xcms}. Additional methods cluster mass signals that originate from a single metabolite, based on rules for mass differences and peak shape comparison \cite{annobird07}. Based on this annotation results, the molecular composition can be calculated if the mass spectrometer has a high-enough accuracy for both the mass and the isotope pattern intensities. %}}} \section{Short Background} %{{{ For soft ionisation methods such as LC/ESI-MS, different adducts (e.g. $[M+K]^+$, $[M+Na]^+ $) and fragments (e.g. $[M-C_3H_9N]^+$, $[M+H-H_20]^+ $) occur. Depending on the molecule having an intrinsic charge, $[M]^+$ may be observed as well. In most cases a substance generates a bulk of different ions. There interpretation is time consuming, especially if substances co-elute. Therefore deconvolution, which separates the different substances and discovery of the ion species is necessary. The working with CAMERA to solves these problems is demonstrated in the next chapters. %}}} \section{Processing with \Rpackage{CAMERA}} \subsection{Preprocessing with \Rpackage{xcms}} \label{preprocess} CAMERA needs as input an \Rclass{xcmsSet} object that is processed with your favorite parameters. For an example see below: \begin{verbatim} library(CAMERA) #Single sample example file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file,method="centWave",ppm=30,peakwidth=c(5,10)) #Multiple sample library(faahKO) filepath <- system.file("cdf", package = "faahKO") xsg <- group(faahko) xsg <- fillPeaks(xsg) \end{verbatim} After the preprocessing we create an CAMERA object, which is called xsAnnotate or in short xsa. \begin{verbatim} library(CAMERA) xsa <- xsAnnotate(xs) \end{verbatim} Depending on your analysis the upcoming workflows may differ at this point and we start with the description of the annotation workflow. Afterwards we demonstrate the wrapper functions and how to interpret the results. \subsection{Annotation Workflow} The CAMERA annotation procedure can be split into two parts: We want to answer the questions which peaks occur from the same molecule and secondly compute its exact mass. Therefore CAMERA annotation workflow contains four different functions: \begin{enumerate} \item peak grouping after retention time (\Rfunction{groupFWHM}) \item peak group verification with peakshape correlation (\Rfunction{groupCorr}) \end{enumerate} Both methods separate peaks into different groups, which we define as "pseudospectra". Those pseudospectra can consists from one up to 100 ions, depending on the molecules amount and ionizability. Afterwards the exposure of the ion species can be performed with: \begin{enumerate} \item annotation of possible isotopes (\Rfunction{findIsotopes}) \item annotation of adducts and calculating hypothetical masses for the group (\Rfunction{findAdducts}) \end{enumerate} This workflow results in a data-frame similar to a \Rpackage{xcms} peak table, that can be easily stored in a comma separated table (Excel-readable). The next section shows some practical examples. \subsubsection{Working with single sample} Let's come to the practical work. Here we have a single sample file either in positive or negative ionization mode. The \Rclass{xcmsSet} was created as shown in section \ref{preprocess}. \begin{verbatim} # Create an xsAnnotate object an <- xsAnnotate(xs) # Group after RT anF <- groupFWHM(an, perfwhm = 0.6) # Annotate isotopes anI <- findIsotopes(anF, mzabs = 0.01) # Verify grouping anIC <- groupCorr(anI, cor_eic_th = 0.75) #Annotate adducts anFA <- findAdducts(anIC, polarity="positive") \end{verbatim} In the above example, we create the pseudospectra with the retentiontime information. The \Rfunarg{perfwhm} parameter defines the window width, which is used for matching. Lower it for a smaller windows or set it to a higher value, if the retention time varies. This step generate 14 pseudospectra. Afterwards we annotate isotopic peaks, with \Rfunarg{mzabs} as allowed m/z error. In this example we find 36 isotope peaks, with means the number of $[M+1],[M+2],\ldots$ ions. This isotope informations are useful in the next step, where for every peak in one pseudospectra a pairwise EIC correlation is performed. If the correlation value between two peaks is higher than the threshold \Rfunarg{cor\_eic\_th} it will stay in the group, otherwise both are separated. If the peaks are annotated isotope ions, they will not be divided. This seperates our 14 pseudospectra into 35. Additionally we could set an constraint that primary adducts e.g. $[M+H]$ and $[M+Na]$ stay together, even if the correlation value is small. To do so call the function with an polarity parameter like in \Rfunction{findAdducts} (groupCorr (..., polarity="positive"), which only seperates the 14 into 32 groups. The difference is only small, three single ions were not sorted out, but as shown in \ref{fig:groupCorr} it is important. In the upper example an annotation was possible in contrast to the lower one, because the $[M+Na]$ was put into his own group. So we strongly encourage the use of the polarity mode. <>= library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave",ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) an <- findIsotopes(an) anC <- groupCorr(an, cor_eic_th = 0.75) anCp<- groupCorr(an, cor_eic_th = 0.75, polarity="positive") an1 <- findAdducts(anC, polarity="positive") an2 <- findAdducts(anCp, polarity="positive") par(mfrow=c(2,1)) plotEICs(an1, pspec=8, maxlabel=5) plotEICs(an2, pspec=8, maxlabel=5) @ \begin{figure} \begin{center} \includegraphics{CAMERA-groupCorr} \end{center} \caption{\label{fig:GroupCorr} Example of using the polarity parameter} \end{figure} After the second pseudogroup creating step we now finally do a complete annotation of adducts. Therefore, the \Rfunarg{polarity} parameter must be set. Finally, export the results with: \begin{verbatim} peaklist <- getPeaklist(xsa) write.csv(peaklist, file='xsannotated.csv') \end{verbatim} where file is the ouput filename. That's so far as for the simple sample approach. Please note that every method has additional parameters, that are not explicitly mentioned here. Also if your analysis doesn't need annotations, only a seperation into groups, then simply stop after groupCorr. The grouping results are stored in \Rfunarg{object@pspectra}, which is a list, that saves as element the peakindexes for every group. \begin{verbatim} #xsa is here the result from findAdducts peak.idx <- xsa@pspectra[[1]] #print the indexes of all peaks from pseudospectrum 1 cat(peak.idx) \end{verbatim} \subsubsection{Working with multiple samples} In this case we have a multiple samples experiment like replicates of one probe or a wildtype vs. mutant experiment. As in the previous example, we start with the already processed xcmsSet-object. Note: If you want an \Rfunction{diffreport} later on, make sure you run \Rfunction{fillPeaks} on your xcmsSet before. As test dataset we use here the faahKO. For more information about the dataset see http://dx.doi.org/10.1021/bi0480335. CAMERA contains different approaches with multiple sample analysis. Here we only show the most common way, for the other strategies see the xsAnnotate manpage, especially the parameter sample. \begin{verbatim} #Create an xsAnnotate object xsa <- xsAnnotate(xsg) #Group after RT value of the xcms grouped peak xsaF <- groupFWHM(xsa, perfwhm=0.6) #Verify grouping xsaC <- groupCorr(xsaF) #Annotate isotopes, could be done before groupCorr xsaFI <- findIsotopes(xsaC) #Annotate adducts xsaFA <- findAdducts(xsaFI, polarity="positive") #Get final peaktable and store on harddrive write.csv(getPeaklist(xsaFA),file="result_CAMERA.csv") \end{verbatim} \subsection{Interpretation of the Results} \label{CAMERAresults} \begin{table}[ht] \begin{center} \begin{tabular}{|c|cc|c|c|l|}\hline id & mz & rt & isotopes & adduct & pc \\ \hline 65 & 176.04 & 280.09 & & & \\ \rowcolor{blue!20} 76 & 136.05 & 280.43 &[14][M+1]1+ & & 5\\ \rowcolor{blue!20} 77 & 135.05 & 280.43 &[14][M]1+ & & 5\\ \rowcolor{blue!20} 74 & 153.06 & 280.43 & &[M+H]+ 152.05437 & 5\\ \rowcolor{blue!20} 75 & 175.04 & 280.43 & &[M+Na]+ 152.05437 & 5\\ \rowcolor{blue!20} 73 & 197.02 & 280.76 & &[M+2Na-H]+ 152.05437 & 5\\ 78 & 377.74 & 286.15 & & & \\ 79 & 732.5 & 286.49 & & & \\ \rowcolor{red!20} 83 & 488.32 & 286.82 & &[M+Na]+ 465.33205 & 7\\ \rowcolor{red!20} 82 & 466.34 & 286.82 & &[M+H]+ 465.33205 & 7\\ ...&&&&&\\ \hline \end{tabular} \end{center} \caption{{\footnotesize{Example of annotation result for one sample. Columns with intensity values are omitted. blue-line: annotated group 5, red-line: annotated group 7}}} \label{tab:int} \end{table} \ref{tab:int} shows an cutout example of an annotation result. The columns with the intensity values are omitted and the rows are ordered by their rt values for better readability. The column \textit{pc} shows the result of the peak correlation based annotation (independent of the annotations \textit{iso} and \textit{adduct}). Peaks with the same label are supposed to belong to the same spectrum. The column \textit{adduct} shows the annotation hypotheses for the ions species. The value after the brackets is the estimated molecular mass. The column \textit{isotopes} contains the annotated isotopes for a monoisotopic peak. The values in the first square brackets denote the isotope-group-id(column \textit{id}), the second is the isotope annotation and the number after the brackets is the charge of the isotope. \section{Wrapper functions and combination with diffreport} \subsubsection{The function \Rfunction{annotate}} \Rfunction{Annotate} is a wrapper function for the annotation workflow. It's similar to \Rfunction{annnoteDiffreport}, but doesn't combine the results with those from the diffreport. With the parameter \Rfunarg{pval\_list} a handmade preselection of pseudospectra is possible. A "quick"\ mode is also available, that runs only \Rfunction{groupFWHM} and \Rfunction{findIsotopes}. The normal mode runs \Rfunction{groupFWHM}, \Rfunction{findIsotopes}, \Rfunction{group\-Corr} and \Rfunction{findAdducts} in order as mentioned. Every parameter of these functions also work with \Rfunction{annotate}. As a small example: \begin{verbatim} #A full annotation run xsa <- annotate(xs, perfwhm=0.7, cor_eic_th=0.75, ppm=10, polarity="positive") #Generate result peaklist <- getPeaklist(xsa) #Save results write.csv(peaklist,file="results.csv") \end{verbatim} Similar to the previous example, the grouping is followed by an annotation. But in contrast we now have additional summaries respectively analysis functions. For a comparison and statistical analysis between different sample classes, \Rpackage{xcms} contains the \Rfunction{diffreport} function. CAMERA can use this method for a better representation. \begin{verbatim} #Run fillPeaks on xcmsSet xsg.fill <- fillPeaks(xsg) #Make a diffreport with CAMERA result diffreport <- annotateDiffreport(xsg.fill) #Save on harddrive write.csv(diffreport, file="diffreport.csv") \end{verbatim} The \Rfunction{annotateDiffreport} is a wrapper for the \Rpackage{xcms} \Rfunction{diffreport} function and combines the results from CAMERA. The resulting table has three different columns, see section \ref{CAMERAresults}. For a speed up it's possible to preselect pseudospectra or make an automatic selection based on the diffreport result. As an example that selects only groups with a fold change higher than 4. \begin{verbatim} #Example 1 with creating list of interest from grouped xcmsSet diffreport <- annotateDiffreport(xsg.fill, quick=TRUE) #Save results write.csv(diffreport, file="diffreport.csv") #Look into the table and select interesting pseudospectra #e.g. pseudospectra 10,11 and 30 psg_list <- c(10,11,30) diffreport.annotated <- annotateDiffreport(xsg.fill, psg_list=psg_list, polarity="positive") #Example 2 with automatic selection diffreport.annotated <- annotateDiffreport(xsg.fill, fc_th=4, polarity="positive") \end{verbatim} Both examples generate a data-frame, identical to the normal diffreport result, but now with three additional result columns from CAMERA. In example 1 we perform a quick-run, that means we only generate the xsAnnotate object und call groupFWHM and findIsotopes. From these results we preselect 3 pseudospectra (10,11,30), taken from the column pc. In the next run, we run annotateDiffreport again with our list as parameter. An annotation will only be done for these three groups. In example 2 we perform an automatic preselection, where the \Rfunarg{fc\_th} parameter defines a threshold for selecting groups, which contains ions with a fold change higher than four. For other pseudospectra, no adduct annotation will be calculated. The fold change value is taken from the diffreport result. For other parameters see the manpage of \Rfunction{annotateDiffreport}. \subsection{Visualisation of the Results} For a graphical presentation of the annotation result CAMERA provides the function \Rfunction{plotEICs} to visualize the raw data and the function \Rfunction{plotPsSpectrum} to plot all peaks of a speudospectrum. The next examples shows the use of both functions. <>= library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave",ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) an <- findAdducts(an, polarity="positive") plotEICs(an, pspec=2, maxlabel=5) @ \begin{figure} \begin{center} \includegraphics{CAMERA-EICPspec1} \end{center} \caption{\label{EICpspec1} EICs.} \end{figure} \ref{EICpspec1} displays the EICs of all peaks from one pseudospectrum. With this plot you can manual check if the grouping makes sense. In the other figure \ref{EICpspec2} you see a typical $m/z$ plot, with labelled, annotated peaks. <>= plotPsSpectrum(an, pspec=2, maxlabel=5) @ \begin{figure} \begin{center} \includegraphics{CAMERA-Pspec1} \end{center} \caption{\label{EICpspec2} Spectra.} \end{figure} % \section{Function Overview} This section contains for every CAMERA function a small introduction with an example. See the manpages for further informations. \subsection{Function annotate} \label{fct:annotate} Wrapper function for the whole annotation workflow. Returns a xsAnnotate object. It handles also all functions parameters. \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) xsa <- annotate(xs) \end{verbatim} \subsection{Function annotateDiffreport} \label{fct:annotateDiffreport} Wrapper function for the xcms diffreport and the annotate function. Returns a diffreport with the results from CAMERAs annotation progress. It handles also all functions parameters. \begin{verbatim} library(CAMERA) library(faahKO) xs.grp <- group(faahko) xs.fill <- fillPeaks(xs.grp) diffreport <- annotateDiffreport(xs.fill) write.csv(diffreport, file="...") \end{verbatim} The combination of the diffreport and the CAMERA result can also be done without these functions. Therefore diffreports \Rfunarg{sortpval} argument must set to FALSE. After the combination the sorting after the pvalue can be make up. \begin{verbatim} diffrep <- diffreport(...) xsa.peaklist <- getPeaklist(xsa) diffrep.new <- cbind(diffrep, xsa.peaklist[, c("isotopes", "adduct", "pcgroup")]) #Sort after pvalue diffrep.new <- diffrep.new[order(diffrep.new[,"pvalue"]),] \end{verbatim} \subsection{Function findAdducts} \label{sec:findAdducts} After the peak grouping into pseudospectra with \Rfunction{groupCorr} or \Rfunction{groupFWHM} the resulting \Rclass{xsAnnotate} can be processed with \Rfunction{findAdducts}. For every pseudospectra all possible adducts are calculated based on a provided rule table. As default CAMERA calculate its own table, which contains every possible combination from the standard ions H, Na, K, NH4 and CL, depending on your ionization mode. Additional CAMERA contains four precalculated rule tables: primary\_adducts\_pos, primary\_adducts\_neg, extended\_adducts\_pos, extended\_adducts\_neg Those can be applied as shown in the example. For creating your own rule table, see \ref{sev:ruletable}. \begin{verbatim} file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) an <- findIsotopes(an) # optional but recommended. an <- findAdducts(an, polarity="positive") #With provided rule table file <- system.file('rules/primary_adducts_pos.csv', package = "CAMERA") rules <- read.csv(file) an <- findAdducts(an, polarity="positive", rules=rules) \end{verbatim} \subsection{Function findIsotopes} \label{sec:findIsotopes} For a provided xsAnnotate, CAMERA can identify isotope peaks within every pseudospectra. The function \Rfunction{findIsotopes} takes as parameter \Rfunarg{maxcharge} and \Rfunarg{maxiso} that controls the maximum number of the expected isotopes within one cluster and their charges. \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) an <- findIsotopes(an) \end{verbatim} \subsection{Function findNeutralLoss} \label{sec:findNeutralLoss} A common strategy to identify interesting compounds is the screening after specific neutral losses. CAMERA adopts this strategy and provides with \Rfunction{findNeutralLoss} and \Rfunction{findNeutralLossSpecs} an interface for scanning every pseudospectrum for neutral losses. The difference between both methods is in the results. \Rfunction{findNeutralLossSpecs} returns an artificial xcmsSet containing the peaks, which have the neutral loss. \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) xs.pseudo <- findNeutralLoss(an,mzdiff=18.01,mzabs=0.01) #Searches for Peaks with water loss xs.pseudo@peaks #show Hits \end{verbatim} \subsection{Function findNeutralLossSpecs} \label{sec:findNeutralLossSpecs} \Rfunction{findNeutralLossSpecs} returns a boolean vector for every pseudospectrum, where a hit is marked with TRUE. \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) hits <- findNeutralLossSpecs(an,mzdiff=18.01,mzabs=0.01) #Searches for pseudspecta with water loss \end{verbatim} \subsection{Function getIsotopeCluster} \label{sec:getIsotopeCluster} This method extracts the isotope annotation from a xsAnnotate object. The order of the resulting list correspond to those from the whole peaklist. \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) an <- findIsotopes(an) isolist <- getIsotopeCluster(an) isolist[[10]] #get IsotopeCluster 10 \end{verbatim} See the manpage for an example interaction with \Rpackage{Rdisop} to calculate the molecular composition. \subsection{Function getPeaklist} \label{sec:getPeaklist} This function returns a peaklist containing all information from an xsAnnotate object. This peaklist can be directly saved as an csv file. \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) xsa <- xsAnnotate(xs) xsa <- groupFWHM(xsa) xsa <- findIsotopes(xsa) xsa <- findAdducts(xsa, polarity="positive") peaklist <- getPeaklist(xsa) write.csv(peaklist,file="...") \end{verbatim} \subsection{Function getpspectra} \label{sec:getpspectra} This function returns for a provided pseudospectrum index its peaktable and CAMERA's annotation information. This peaktable can be directly saved as an csv file. Note: The indixes for the isotopes, are those from the whole peaklist. See \ref{sec:getPeaklist}. \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) xsa <- xsAnnotate(xs) xsa <- groupFWHM(xsa) psp.peaks <- getpspectra(xsa, 1) psp.peaks \end{verbatim} \subsection{Function groupCorr} \label{sec:groupCorr} This function calculates the pearson correlation coefficient based on the peak shapes of every peak in the pseudospectrum to separate co-eluted substances. It's recommended to use groupFWHM before, otherwise the runtime is very long!! \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) xsa <- xsAnnotate(xs) xsa <- groupFWHM(xsa) xsa <- groupCorr(xsa) \end{verbatim} \subsection{Function groupFWHM} \label{sec:groupFWHM} For grouping peaks into pseudospectra, this function uses the retention time information. Every peaks that falls into a defined window are considered as one group. The window is defined as a percentage of the peak FWHM around the RT\_med value. \begin{verbatim} library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave", ppm=30, peakwidth=c(5,10)) xsa <- xsAnnotate(xsa) xsa <- groupFWHM(xsa) \end{verbatim} \subsection{Function plotEICs} \label{sec:plotEICs} This method returns a batch plot including the extracted ion chromatograms to the current graphics device for a provided pseudospectrum. \begin{verbatim} #Plot all peak EICs of pseudospectrum 1 plotEICs(xsa,1) \end{verbatim} \subsection{Function plotPsSpectrum} \label{sec:plotPsSpectrum} This method plots the spectrum of a pseudospectrum, with labelling the most intense peaks. \begin{verbatim} #Plot the spectrum of pseudospectrum 1 and highlight the #annotation and mz labels of the 10 strongest peaks plotPsSpectrum(xsa,1,maxlabel=10) \end{verbatim} \begin{thebibliography}{t1} \bibitem{annobird07} Ralf Tautenhahn, Christoph B\"ottcher, Steffen Neumann : Annotation of LC/ESI--MS Mass Signals, BIRD 2007 Proc. of BIRD 2007 -- 1st International Conference on Bioinformatics Research and Development, 2007. \url{http://www.springerlink.com/content/473l404001787974/} and \url{http://msbi.ipb-halle.de/~rtautenh/bird07.pdf} %\bibitem{disopwabi06} B\"ocker \end{thebibliography} \end{document}