---
title: "QUBIC Tutorial"
author:
- Yu Zhang
- Juan Xie
- Qin Ma
classoption: hyperref,
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{QUBIC Tutorial}
output:
  pdf_document:
    fig_caption: yes
    highlight: tango
  html_document:
    theme: flatly
    highlight: tango
  word_document:
    fig_caption: yes
    fig_height: 5
    fig_width: 5
    highlight: tango
    md_extensions: -autolink_bare_uris
bibliography: bibliography.bib
---

Gene expression data is very important in experimental molecular biology [@brazma00], especially for cancer study [@fehrmann15]. The large-scale microarray data and RNA-seq data provide good opportunity to do the gene co-expression analyses and identify co-expressed gene modules; and the effective and efficient algorithms are needed to implement such analysis. Substantial efforts have been made in this field, such as @cheng00, Plaid [@lazzeroni02], Bayesian Biclustering [BCC, @gu08], among them Cheng and Church and Plaid has the R package implementation. It is worth noting that our in-house biclustering algorithm, QUBIC [@li09], is reviewed as one of the best programs in terms of their prediction performance on benchmark datasets. Most importantly, it is reviewed as the best one for large-scale real biological data [@eren12].

Until now, QUBIC has been cited over 110 times (via [Google Scholar](https://scholar.google.com/scholar?cites=15047943471221701463)) and its web server, QServer [@zhou12], was developed in 2012 to facilitate the users without comprehensive computational background [@zhou12]. In the past five years, the cost of RNA-sequencing decreased dramatically, and the amount of gene expression data keeps increasing. Upon requests from users and collaborators, we developed this R package of QUBIC to void submitting large data to a webserver.

The unique features of our R package include (1) updated and more stable back-end resource code (re-written by C++), which has better memory control and is more efficient than the one published in 2009. For an input dataset in Arabidopsis, with 25,698 genes and 208 samples, we observed more than 40% time saving; and (2) comprehensive functions and examples, including discretize function, heatmap drawing and network analysis. 


# Other languages

If R is not your thing, there is also [a C version of `QUBIC`](https://github.com/maqin2001/QUBIC).

# Help 

If you are having trouble with this R package, contact [the maintainer, Yu Zhang](mailto:zy26@jlu.edu.cn). 

# Install and load

Stable version from BioConductor


```r
source("https://bioconductor.org/biocLite.R")
biocLite("QUBIC")
```

Or development version from GitHub


```r
install.packages("devtools")
devtools::install_github("zy26/QUBIC")
```

Load `QUBIC`


```{r, message=FALSE}
library("QUBIC")
```

Functions
======================

There are six functions provided by QUBIC package. 

* ```qudiscretize()```creates a discrete matrix for a given gene expression matrix;
* ```BCQU()```performs a qualitative biclustering for real matrix;
* ```BCQUD()```performs a qualitative biclustering for discretized matrix;
* ```quheatmap()```can draw heatmap for singe bicluster or overlapped biclusters;
* ```qunetwork()``` can automatically create co-expression networks based on the identified biclusters by QUBIC;
* ```qunet2xml()```can convert the constructed co-expression networks into XGMML format for further network analysis in Cytoscape, Biomax and JNets. 
The following examples illustrate how these functions work.

# Example of a random matrix with two diferent embedded biclusters

```{r random, cahce = TRUE, message = FALSE, fig.cap = 'Heatmap for two overlapped biclusters in the simulated matrix'}
library(QUBIC)
set.seed(1)
# Create a random matrix
test <- matrix(rnorm(10000), 100, 100)
colnames(test) <- paste("cond", 1:100, sep = "_")
rownames(test) <- paste("gene", 1:100, sep = "_")

# Discretization
matrix1 <- test[1:7, 1:4]
matrix1

matrix2 <- qudiscretize(matrix1)
matrix2

# Fill bicluster blocks
t1 <- runif(10, 0.8, 1)
t2 <- runif(10, 0.8, 1) * (-1)
t3 <- runif(10, 0.8, 1) * sample(c(-1, 1), 10, replace = TRUE)
test[11:20, 11:20] <- t(rep(t1, 10) * rnorm(100, 3, 0.3))
test[31:40, 31:40] <- t(rep(t2, 10) * rnorm(100, 3, 0.3))
test[51:60, 51:60] <- t(rep(t3, 10) * rnorm(100, 3, 0.3))

# QUBIC
res <- biclust::biclust(test, method = BCQU())
summary(res)

# Show heatmap
hmcols <- colorRampPalette(rev(c("#D73027", "#FC8D59", "#FEE090", "#FFFFBF", 
    "#E0F3F8", "#91BFDB", "#4575B4")))(100)
# Specify colors

par(mar = c(4, 5, 3, 5) + 0.1)
quheatmap(test, res, number = c(1, 3), col = hmcols, showlabel = TRUE)

```

# Example of BicatYeast

```{r yeast, cache = TRUE}
library(QUBIC)
data(BicatYeast)

# Discretization
matrix1 <- BicatYeast[1:7, 1:4]
matrix1

matrix2 <- qudiscretize(matrix1)
matrix2

# QUBIC
x <- BicatYeast
system.time(res <- biclust::biclust(x, method = BCQU()))

summary(res)
```

We can draw heatmap for single bicluster.

```{r yeast heatmap, cache = TRUE, message = FALSE, fig.cap = 'Heatmap for the second bicluster identified in the BicatYeast data. The bicluster consists of 53 genes and 5 conditions', fig.show = 'asis'}

# Draw heatmap for the 2th bicluster identified in BicatYeast data

library(RColorBrewer)
paleta <- colorRampPalette(rev(brewer.pal(11, "RdYlBu")))(11)
par(mar = c(5, 4, 3, 5) + 0.1, mgp = c(0, 1, 0), cex.lab = 1.1, cex.axis = 0.5, 
    cex.main = 1.1)
quheatmap(x, res, number = 2, showlabel = TRUE, col = paleta)

```

We can draw heatmap for overlapped biclusters.

```{r yeast heatmap2, cache = TRUE, message = FALSE, fig.cap = 'Heatmap for the second and third biclusters identified in the BicatYeast data. Bicluster #2 (topleft) consists of 53 genes and 5 conditions, and bicluster #3 (bottom right) consists of 37 genes and 7 conditions.', fig.show = 'asis'}

# Draw for the 2th and 3th biclusters identified in BicatYeast data

par(mar = c(5, 5, 5, 5), cex.lab = 1.1, cex.axis = 0.5, cex.main = 1.1)
paleta <- colorRampPalette(rev(brewer.pal(11, "RdYlBu")))(11)
quheatmap(x, res, number = c(2, 3), showlabel = TRUE, col = paleta)

```

We can draw network for single bicluster.

```{r yeast network1, cache = TRUE, message = FALSE, fig.cap = 'Network for the second bicluster identified in the BicatYeast data.', fig.show = 'asis'}

# Construct the network for the 2th identified bicluster in BicatYeast
net <- qunetwork(x, res, number = 2, group = 2, method = "spearman")
if (requireNamespace("qgraph", quietly = TRUE))
    qgraph::qgraph(net[[1]], groups = net[[2]], layout = "spring", minimum = 0.6, 
        color = cbind(rainbow(length(net[[2]]) -  1), "gray"), edge.label = FALSE)

```

We can also draw network for overlapped biclusters.

```{r yeast network2, cache = TRUE, message = FALSE, fig.cap = 'Network for the second and third biclusters identified in the BicatYeast data.', fig.show = 'asis'}

net <- qunetwork(x, res, number = c(2, 3), group = c(2, 3), method = "spearman")
if (requireNamespace("qgraph", quietly = TRUE))
    qgraph::qgraph(net[[1]], groups = net[[2]], layout = "spring", minimum = 0.6, 
        legend.cex = 0.5, color = c("red", "blue", "gold", "gray"), edge.label = FALSE)

```

```r
# Output overlapping heatmap XML, could be used in other software such
# as Cytoscape, Biomax or JNets
sink('tempnetworkresult.gr')
qunet2xml(net, minimum = 0.6, color = cbind(rainbow(length(net[[2]]) - 1), "gray"))
sink()
# We can use Cytoscape, Biomax or JNets open file named 'tempnetworkresult.gr'
```

# Example of E.coli data

The E.coli data consists of 4,297 genes and 466 conditions.

```{r ecoli, cache = TRUE}
library(QUBIC)

# Load E.coli data
if (requireNamespace("QUBICdata", quietly = TRUE)) {
    data("ecoli", package = "QUBICdata")
} else {
    # Fake ecoli, all result below 
    ecoli <- BicatYeast
    warning("All results below are BicatYeast, not ecoli")
}

# Discretization
matrix1 <- ecoli[1:7, 1:4]
matrix1

matrix2 <- qudiscretize(matrix1)
matrix2

# QUBIC
res <- biclust::biclust(ecoli, method = BCQU(), r = 1, q = 0.06, c = 0.95, o = 100, 
    f = 0.25, k = max(ncol(ecoli)%/%20, 2))
system.time(res <- biclust::biclust(ecoli, method = BCQU(), r = 1, q = 0.06, c = 0.95, 
    o = 100, f = 0.25, k = max(ncol(ecoli)%/%20, 2)))
summary(res)

```

```{r ecoli heatmap, cache = TRUE, message = FALSE, fig.cap = 'Heatmap for the fifth bicluster identified in the E.coli data. The bicluster consists of 103 genes and 38 conditions', fig.show = 'asis'}

# Draw heatmap for the 5th bicluster identified in E.coli data

library(RColorBrewer)
paleta <- colorRampPalette(rev(brewer.pal(11, "RdYlBu")))(11)
par(mar = c(5, 4, 3, 5) + 0.1, mgp = c(0, 1, 0), cex.lab = 1.1, cex.axis = 0.5, 
    cex.main = 1.1)
quheatmap(ecoli, res, number = 5, showlabel = TRUE, col = paleta)

```

```{r ecoli heatmap2, cache = TRUE, message = FALSE, fig.cap = 'Heatmap for the fourth and eighth biclusters identified in the E.coli data.Bicluster #4 (topleft) consists of 108 genes and 44 conditions, and bicluster #8 (bottom right) consists of 26 genes and 33 conditions', fig.show = 'asis'}

library(RColorBrewer)
paleta <- colorRampPalette(rev(brewer.pal(11, "RdYlBu")))(11)
par(mar = c(5, 4, 3, 5), cex.lab = 1.1, cex.axis = 0.5, cex.main = 1.1)
quheatmap(ecoli, res, number = c(4, 8), showlabel = TRUE, col = paleta)

```

```{r ecoli network, cache = TRUE, message = FALSE, fig.cap = 'Network for the fifth bicluster identified in the E.coli data.', fig.show = 'asis' }

# construct the network for the 5th identified bicluster in E.coli data
net <- qunetwork(ecoli, res, number = 5, group = 5, method = "spearman")
if (requireNamespace("qgraph", quietly = TRUE))
    qgraph::qgraph(net[[1]], groups = net[[2]], layout = "spring", minimum = 0.6, 
        color = cbind(rainbow(length(net[[2]]) - 1), "gray"), edge.label = FALSE)


```

```{r ecoli network2, cache = TRUE ,message = FALSE, fig.cap = 'Network for the fourth and eighth biclusters identified in the E.coli data.', fig.show = 'asis'}

# construct the network for the 4th and 8th identified bicluster in E.coli data
net <- qunetwork(ecoli, res, number = c(4, 8), group = c(4, 8), method = "spearman")
if (requireNamespace("qgraph", quietly = TRUE))
    qgraph::qgraph(net[[1]], groups = net[[2]], legend.cex = 0.5, layout = "spring", 
        minimum = 0.6, color = c("red", "blue", "gold", "gray"), edge.label = FALSE)

```

# References