---
title: "ADAM: Activity and Diversity Analysis Module"
author: "AndrÃ© L. Molan, Giordano B. S. Seco, Agnes A. S. Takeda, Jose L. Rybarczyk-Filho"
date: "`r doc_date()`"
package: "`r pkg_ver('ADAM')`"
bibliography: bibliography.bib
fig_caption: yes
output: 
    BiocStyle::html_document:
        css: custom.css
vignette: >
    %\VignetteIndexEntry{"Using ADAM"}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

# Overview
*ADAM* is a GSEA R package created to group a set of genes from 
comparative samples (control *versus* experiment) according to their 
respective functions (Gene Ontology and KEGG pathways as default) and show 
their significance by calculating p-values referring to gene diversity and 
activity (@Castro2009). Each group of genes is called GFAG 
(Group of Functionally Associated Genes). The package has support for many 
species in regards to the genes and their respective functions.

In the package's analysis, all genes present in the expression data are 
grouped by their respective functions according to the domains described by 
AnalysisDomain argument. The relationship between genes and functions are made
based on the species annotation package. If there is no annotation package, 
a three column file (gene, function and function description) must be 
provided. For each GFAG, gene diversity and activity in each sample are 
calculated. As the package always compare two samples (control *versus* 
experiment), relative gene diversity and activity for each GFAG are 
calculated. Using bootstrap method, for each GFAG, according to relative gene
diversity and activity, two p-values are calculated. The p-values are then 
corrected, by using the 
correction method defined by *PCorrectionMethod* argument, generating a q-value 
(@molan2018). The significative GFAGs will be those whose q-value stay under 
the cutoff set by *PCorrection* argument. Optionally, it's possible to run 
Wilcoxon ranck sum test and/or Fisher's exact test (@fontoura2016). These tests
also provide a corrected p-value, and siginificative groups can be seen through 
them.

# GFAGAnalysis
*GFAGAnalysis* function provides a complete analysis, using all available
arguments. As an example, lets consider a gene expression set of 
*Aedes aegypti*:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
suppressMessages(library(ADAM))
```

```{r eval=TRUE, out.height.px=6, out.width.px=6}
data("ExpressionAedes")
head(ExpressionAedes)
```

The first column refers to the gene names, while the others are the expression
obtained by a specific experiment (in this case, RNA-seq). ADAM
always need two samples (control *versus* experiment). This way, we must 
select two sample columns from the expression data for each comparison:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
Comparison <- c("control1,experiment1","control2,experiment2")
```

Each GFAG has a number of genes associated to it. This way, the analysis can 
consider all GFAGs or just those with a certain number of genes:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
Minimum <- 3
Maximum <- 20
```

The p-values for each GFAG is calculated through the bootstrap method, which
demands a seed for generating random numbers and a number of bootstraps steps
(the number of bootstraps should be a value that ensures the p-value 
precision):

```{r eval=TRUE, out.height.px=6, out.width.px=6}
SeedBootstrap <- 1049
StepsBootstrap <- 1000
```

The p-values will be corrected by a specific method with a certain cutoff 
value:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
CutoffValue <- 0.05
MethodCorrection <- "fdr"
```

In order to group the genes according to their biological functions, it's 
necessary an annotation package or a file relating genes and functions. In this
case, *Aedes aegypti* doesn't have an annotation package. This way, we build our
own file:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
data("KeggPathwaysAedes")
head(KeggPathwaysAedes)
```

It's necessary to inform which function domain and gene nomenclature are being 
used. As *Aedes agypti* doesn't have an annotation package, the domain will be 
"own" and the nomenclature "gene":

```{r eval=TRUE, out.height.px=6, out.width.px=6}
Domain <- "own"
Nomenclature <- "geneStableID"
```

Wilcoxon rank sum test and Fisher's exact test will be run:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
Wilcoxon <- TRUE
Fisher <- TRUE
```

As all arguments were defined, then we can run GFAGAnalysis function:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
ResultAnalysis <- suppressMessages(GFAGAnalysis(ComparisonID = Comparison, 
                            ExpressionData = ExpressionAedes,
                            MinGene = Minimum,
                            MaxGene = Maximum,
                            SeedNumber = SeedBootstrap, 
                            BootstrapNumber = StepsBootstrap,
                            PCorrection = CutoffValue,
                            DBSpecies = KeggPathwaysAedes, 
                            PCorrectionMethod = MethodCorrection,
                            WilcoxonTest = Wilcoxon,
                            FisherTest = Fisher,
                            AnalysisDomain = Domain, 
                            GeneIdentifier = Nomenclature))
```

In the example above, we used the function *supressMessages* just to stop 
showing messages during the *GFAGAnalysis* function execution. The output of
*GFAGAnalysis* will be a *list* with two elements. The first corresponds to a
*data frame* showing genes and their respective functions:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
head(ResultAnalysis[[1]])
```

The second element of the output list result corresponds to data frames
according to the argument ComparisonID:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
DT::datatable(as.data.frame(ResultAnalysis[[2]][1]), width = 800,
            options = list(scrollX = TRUE))
DT::datatable(as.data.frame(ResultAnalysis[[2]][2]), width = 800, 
            options = list(scrollX = TRUE))
```

The data frames corresponding to the second element of the list have the 
following columns:

* <div style="text-align: justify"> **ID** - A code identifying the GFAG (GO
term, KEGG pathway or one according to users annotations).</div> 
* <div style="text-align: justify"> **Description** - Description of the 
GFAG.</div> 
* <div style="text-align: justify"> **Raw_Number_Genes** -  Total number of 
genes related to the GFAG.</div>
* <div style="text-align: justify"> **Sample_Number_Genes** - Number of genes, 
present in the sample, related to the GFAG. </div> 
* <div style="text-align: justify"> **H_** - Two columns. GFAG gene diversity
of each sample (control *versus* experiment).</div> 
* <div style="text-align: justify"> **N_** - Two columns. GFAG gene activity 
of each sample (control *versus* experiment).</div> 
* <div style="text-align: justify"> **h** - Relative gene diversity. </div> 
* <div style="text-align: justify"> **n** - Relative gene activity. </div> 
* <div style="text-align: justify"> **pValue_h** - GFAG p-value related to 
gene diversity.</div> 
* <div style="text-align: justify"> **pValue_n** - GFAG p-value related to 
gene activity.</div> 
* <div style="text-align: justify"> **qValue_h** - GFAG corrected p-value 
related to gene diversity.</div> 
* <div style="text-align: justify"> **qValue_n** - GFAG corrected p-value 
related to gene activity.</div> 
* <div style="text-align: justify"> **Significance_h** - GFAG significance
related to gene diversity. "significative" means the q-value is lower than
the cutoff set by PCorrection argument, while "not-significative" means the
opposite.</div> 
* <div style="text-align: justify"> **Significance_n** - GFAG significance
related to gene activity. "significative" means the q-value is lower than
the cutoff set by PCorrection argument, while "not-significative" means the
opposite.</div> 
* <div style="text-align: justify"> **Wilcox_pvalue** - GFAG p-value generated
by Wilcoxon rank test.</div>
* <div style="text-align: justify"> **Wilcox_qvalue** - Wilcoxon GFAG 
corrected p-value.</div>
* <div style="text-align: justify"> **Wilcox_significance** - GFAG 
significance related Wilcoxon test. "significative" means the q-value is lower
than the cutoff set by PCorrection argument, while "not-significative" means 
the opposite.</div> 
* <div style="text-align: justify"> **Fisher_pvalue** - GFAG p-value generated
by Fisher's exact test.</div>
* <div style="text-align: justify"> **Fisher_qvalue** - Fisher GFAG corrected
p-value.</div>
* <div style="text-align: justify"> **Fisher_significance** - GFAG 
significance related to Fisher's exact test. "significative" means the q-value
is lower than the cutoff set by PCorrection argument, while 
"not-significative" means the opposite.</div> 

# ADAnalysis
*ADAnalysis* function provides a partial analysis, where is calculated just
gene diversity and activity of each GFAG with no signicance by bootstrap, 
Wilcoxon or Fisher. As an example, lets consider the same gene expression set
of *Aedes aegypti* previously used in *GFAGAnalysis* funcion example:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
suppressMessages(library(ADAM))
data("ExpressionAedes")
data("KeggPathwaysAedes")
```

As ADAM always need two samples (control *versus* experiment), let's select 
two sample columns from the expression data and define minimum and maximum 
number of genes per GFAG:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
Comparison <- c("control1,experiment1")
Minimum <- 3
Maximum <- 100
```

*Aedes aegypti* doesn't have an annotation package. This way, we build our own
file:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
SpeciesID <- "KeggPathwaysAedes"
```

It's necessary to inform which function domain and gene nomenclature are being 
used. *Aedes agypti* doesn't have an annotation package. So the domain will be 
"own" and the nomenclature "geneStableID":

```{r eval=TRUE, out.height.px=6, out.width.px=6}
Domain <- "own"
Nomenclature <- "geneStableID"
```

As all arguments were defined, then we can run ADAnalysis function:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
ResultAnalysis <- suppressMessages(ADAnalysis(ComparisonID = Comparison, 
                            ExpressionData = ExpressionAedes,
                            MinGene = Minimum,
                            MaxGene = Maximum,
                            DBSpecies = KeggPathwaysAedes, 
                            AnalysisDomain = Domain, 
                            GeneIdentifier = Nomenclature))
```

In the example above, we used the function *supressMessages* just to stop 
showing messages during the *ADAnalysis* function execution. The output of
*ADAnalysis* will be a *list* with two elements. The first corresponds to a
*data frame* showing genes and their respective functions:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
head(ResultAnalysis[[1]])
```

The second element of the output list result corresponds to data frames
according to the argument ComparisonID:

```{r eval=TRUE, out.height.px=6, out.width.px=6}
DT::datatable(as.data.frame(ResultAnalysis[[2]][1]), width = 800, 
            options = list(scrollX = TRUE))
```

The data frames corresponding to the second element of the list have the 
following columns:

* <div style="text-align: justify"> **ID** - A code identifying the GFAG (GO
term, KEGG pathway or one according to users annotations).</div> 
* <div style="text-align: justify"> **Description** - Description of the 
GFAG.</div> 
* <div style="text-align: justify"> **Raw_Number_Genes** - Total number of genes
related to the GFAG. </div>
* <div style="text-align: justify"> **Sample_Number_Genes** - Number of genes,
present in the sample, related to the GFAG. </div> 
* <div style="text-align: justify"> **H_** - Two columns. GFAG gene diversity
of each sample (control *versus* experiment).</div> 
* <div style="text-align: justify"> **N_** - Two columns. GFAG gene activity 
of each sample (control *versus* experiment).</div> 
* <div style="text-align: justify"> **h** - Relative gene diversity. </div> 
* <div style="text-align: justify"> **n** - Relative gene activity. </div> 


# Session information

```{r, eval=TRUE, out.height.px=6, out.width.px=6, label='Session information', , echo=FALSE}
sessionInfo()
```

# References