---
title: "fCI"
author: "Shaojun Tang"
output: 
  BiocStyle::html_document:
        toc: true
---
<!--
%% \VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{fCI}
-->


# `fCI`

`fCI` (f-divergence Cutoff Index), identifies DEGs by computing the difference between the distribution of fold-changes for the control-control 
and remaining (non-differential) case-control gene expression ratio data.:

`fCI` provides several
advantages compared to existing methods. Firstly, it performed equally well or 
better in finding DEGs in diverse data types (both discrete and continuous data)
from various omics technologies compared to methods that were specifically 
designed for the experiments. Secondly, it fulfills an urgent need in the 
omics research arena. The increasingly common proteogenomic approaches enabled
by rapidly decreasing sequencing costs facilitates the collection of 
multi-dimensional (i.e. proteogenomics) experiments, for which no efficient 
tools have been developed to find co-regulation and dependences of DEGs
between treatment conditions or developmental stages. Thirdly, fCI does not 
rely on statistical methods that require sufficiently large numbers of 
replicates to evaluate DEGs. Instead fCI can effectively identify changes in 
samples with very few or no replicates.

* **REQUIRED** (Equivalent to an `ERROR` in `R CMD check`). This means the package
  is missing something critical and it cannot be accepted into Bioconductor
  until the issue is fixed. 
* **RECOMMENDED** (Equivalent to a `WARNING` in `R CMD check`). We strongly
  encourage that these issues be fixed. In the weeks leading up to a
  Bioconductor release we will ask package authors to fix these issues.
* **CONSIDER**: Not necessarily something bad, just something we wanted to point
  out. package authors don't need to take action on these, but they can.
  These are analagous   to `NOTE`s in `R CMD check`, though unlike 
  `NOTE`s, there is no   expectation that they will increase in severity
  over time.



# Using `fCI`

`fCI` can also be run interactively:

```{r}
library(fCI)
suppressWarnings(library(fCI))
suppressWarnings(library(psych))
```



# Installing `fCI`

`fCI` should be installed as follows:

```{r eval=FALSE}
source("http://bioconductor.org/biocLite.R")
biocLite("fCI")
library(fCI)
```


## Dependency Checks

* **Checking if other packages can import this one...**
  Checks to make sure that there will be no import problems
  if another package imports your package. (`REQUIRED`)
* **Checking to see if we understand object initialization....**
  Reports if it can't figure out how objects were
  initialized (`CONSIDER`).


# Examples on how to use `fCI`

`fCI` can be used as the follows:


## Environment Setup:
* **Load the example sample data** provided by fCI package under data/.
 This data contains gene/protein expression values with columns representing
samples/lanes/replicates, and rows representing genes. 
* **In this example**, a total of 3 control replicates (column 1 through 3) and 
3 case replicates (columns 4 through 6) is shown.


## Load the data
```{r, results='asis'}
fci.data=data.frame(matrix(sample(3:100, 1043*6, replace=TRUE), 1043,6))
```

## Finding Differentially Expressed Genes:
* To find differentially expressed genes** using column 1 to 3 as controls and 
4 to 6 as case replicates, execute the following function call :

```{r}
 suppressWarnings(library(gtools))
 targets=fCI.call.by.index(c(1,2,3), c(4,5,6), fci.data)
 head(targets)
```

## Result Interpretation
* The output will be the genes (ids) that are differentially expressed and 
its frequency (a ratio tells how often each gene is shown to be differentially 
expressed in all fCI combinations). 
*In general, the higher the frequency (the ratio) tells us how likely a gene
is differentially expressed
* For example,  A ratio of 75% means the gene under study is shown to be
a dysregulated target in 3 out of 4 fCI pairwise analysis.

*As fCI is coded using object oritented programming, all computations are 
based on object manipulation. 
*You will be able to perform very versatile analysis by changing the software
parameters and altering the options in computing the dysresgulated genes.

## Illustrate the fCI analysis in details
* First, create an object -- fci, which holds the gene expression levels, 
default model parameters.

```{r}
  fci=new("NPCI")
```

* next, you need to provide gene/protein expression values for subsequent 
analysis. method 1): assign an existing data frame (of gene expression values) 
in R console directly to the object, for example : 

```{r eval=FALSE}
fci@sample.data.normalized=fci.data**
```

* or simply provide the pathname for the file (tab-delimited file with 
rownames) that contains the expression values. 
```{r eval=FALSE}
**fci@sample.data.file="c://home//fci_data.txt"**
``` 
  
## Assign data to fCI object

assign the built-in dataset to the object
```{r}
if(dim(fci.data)[1]>0){
  fci@sample.data.normalized=fci.data
}
```
   
after you assign the data to the object, you should initialize the object  
fci (which will remove genes with zero expression values and intialize the
object parameters) and then assign the fci's control replicates (fci@wt.index)
with two sample column ids (i.e. 1 & 2) for constructing empirical null 
distribution, and assign fci's control-case replicates (fci@df.index) with two 
sample column ids (i.e. 1 & 5) for control-case distribution. 

```{r}
if(dim(fci.data)[1]>0){
    fci=initialize(fci)
    fci@wt.index=c(1,2)
    fci@df.index=c(1,4)
}
```

next, you will perform the formal fCI analysis by calling the following 
functions to the object.

```{r}
if(dim(fci.data)[1]>0){
fci =populate(fci )
fci =compute(fci )
fci =summarize(fci )
}
```

after the divergence was computed, fCI will generate the final results and 
saved to the 'result' field, which included the total number of 
differentially expressed genes, optimal fold cutoff, and divergence value.
In addition,you will be able to see the differentially expressed gene ids 
(indicated by row numbers) and figures about the empirical-null vs
control-case distribution.


since fCI uses object oriented programming, you can easily change the field
s of the objects, such as fold cutoff list, the column ids for emprical null 
distributions (and/or control-case distribution) to evaluate differential
expression on the samples of interest. After changes are made, rerun step 6 to
compute new analysis results. 

```{r}
if(dim(fci.data)[1]>0){
fci@fold.cutoff.list=list(seq(from=1.2, to=5, by=0.2))
fci@wt.index=c(2,3)
fci@df.index=c(2,5)
}
```

## multi-dimensional data
if the dataset contains multiple control replicates and 
case replicates, you don't need to form the these 
combinations and perform fCI individually. Instead, you 
could invoke fci on a top-level function that will 
automatically perform the analysis on given control
replicate column ids and case/experimental column ids.
Given the dataset that contains 3 control replicates 
and 3 case replicates, fci will form possible 3 
control-control (empirical null) distributions, namely 1-2,
1-3 and 2-3 to construct empirical null ratio distribution,
and 9 control-case ratio distribution, namely 1-4, 1-5,
1-6, 2-4, 2-5, 2-6, 3-4, 3-5, and 3-6. Then fci will pick
one distribution from the control-control, and one distribution 
fro control-case (i.e., 1-2 & 1-5) to form a valid fci analysis. 
There are a total of 3*9=27 unique fci analysis for this example.

```{r}
fci=new("NPCI")

filename=""
if(file.exists("../inst/extdata/Supp_Dataset_part_2.txt")){
  filename="../inst/extdata/Supp_Dataset_part_2.txt"
}else if(file.exists("../../inst/extdata/Supp_Dataset_part_2.txt")){
  filename="../../inst/extdata/Supp_Dataset_part_2.txt"
}

if(nchar(filename)>3){
  fci=find.fci.targets(fci, c(1,2,3), c(4,5,6), 
                       filename,
                       use.normalization=FALSE)
  result=show.targets(fci)
  head(result, 20)
}
```

results from analysis on step will be identical to results 
shown in step 1). However, you have more flexibility 
changing the parameters, and printing the figures for 
sample of interest.

## Normalization 
fCI enables users lot of flexibilities to perform 
differential expression. For example, the users could
choose to normalize the each replicate's gene expression 
based on total-library normalization, trimed-normalization,
median normalization and so on. In addition, fCI enables 
two fundamental options for divergence estimation. The 
first method is the helliger distance estimation (the 
default option) which assumes the log-ratio expression 
values to follow the gaussian distribution. The second
method is the cross-entropy which relax the condition. 


```{r eval=FALSE}
library(fCI)
fci=normalization(fci)
```
## identify a specific sample for differential expression

```{r}
if(dim(fci@sample.data.normalized)[1]>100 & 
     dim(fci@sample.data.normalized)[2]>3){
fci@wt.index=c(1,2)
fci@df.index=c(1,4)
fci@method.option=1
fci =populate(fci )
fci =compute(fci )
fci =summarize(fci )
}
```

## Time-course analysis
besides performing differential expression analysis using
transcriptomic and/or proteomic data, fCI enables the users 
to perform jointly analysis using multi-dimensional data. 
By multi-dimensional, we refer to data that has been 
generated for multiple related samples, i.e. time course,
different tissues, cell types or in cases where both 
transcriptomic and proteomic data are available.

```{r}
if(nchar(filename)>3){
  fci=new("NPCI")
  fci=find.fci.targets(fci, c(1,2,3), c(4,5,6), 
             "../inst/extdata/Supp_Dataset_part_2.txt", 
             use.normalization=FALSE)
  result=show.targets(fci)
  head(result, 20)
}
```