% \VignetteIndexEntry{The Iterative Signature Algorithm for Gene Expression Data}
\documentclass{article}
\usepackage{ragged2e}
\usepackage{url}

\newcommand{\Rfunction}[1]{\texttt{#1()}}
\newcommand{\Rpackage}[1]{\texttt{#1}}
\newcommand{\Rclass}[1]{\texttt{#1}}
\newcommand{\Rargument}[1]{\textsl{#1}}
\newcommand{\filename}[1]{\texttt{#1}}
\newcommand{\variable}[1]{\texttt{#1}}

\SweaveOpts{cache=TRUE}
\SweaveOpts{prefix.string=plot}

\setlength{\parindent}{2em}

\begin{document}

\title{The Iterative Signature Algorithm for Gene Expression Data}
\author{G\'abor Cs\'ardi}
\maketitle

\tableofcontents

\RaggedRight

<<set width,echo=FALSE,print=FALSE>>=
options(width=60)
options(continue=" ")
try(X11.options(type="xlib"), silent=TRUE)
@ 

\section{Introduction}

The Iterative Signature Algorithm (ISA)~\cite{sa,isa,isamod} is a
biclustering method. The input of a biclustering method is a matrix
and the output is a set of biclusters that fulfill some criteria. A
bicluster is a block of the potentially reordered input matrix. 

Most commonly, biclustering algorithms are used on microarray expression data,
to find gene sets that are coexpressed across a subset of the original
samples. In the ISA papers the biclusters are called
transription modules (TM), we will often refer them under this name in
the following.

This tutorial specifically deals with the modular analysis of gene
expression data. Section~\ref{sec:isa} gives a short summary of how
ISA works. If you need more information of the underlying math
or want to apply it to other data, then please see the referenced
papers, the vignette titled ``The Iterative Signature Algorithm'' in
the \Rpackage{isa2} R package, or the ISA homepage at
\url{http://www.unil.ch/cbg/homepage/software.html}.

\section{Preparing the data}

\subsection{Loading the data}

First, we load the required packages and the data to analyze. ISA is
implemented in the \Rpackage{eisa} and \Rpackage{isa2} packages, see
Section~\ref{sec:isapackages} for a more elaborated summary about the
two packages. It is enough to load the \Rpackage{eisa} package,
\Rpackage{isa2} and other required packages are loaded automatically:
<<load the packages,cache=FALSE>>=
library(eisa)
@

In this tutorial we will use the data in the \Rpackage{ALL}
package. 
<<load the data,cache=FALSE>>=
library(ALL)
library(hgu95av2.db)
data(ALL)
@
This is a data set from a clinical trial in acute
lymphoblastic leukemia and it contains \Sexpr{ncol(ALL)} samples
altogether.

\section{Simple ISA runs}

The simplest way to run ISA is to choose the two threshold parameters
and then call the \Rfunction{ISA} function on the
\Rclass{ExpressionSet} object. The threshold parameters tune the size
of the modules, less stringent (i.e. smaller) values result bigger,
less correlated modules. The optimal values depend on your data and
some experimentation is needed to determine them.

Since running ISA might take a couple of minutes and the results 
depend on the random number generator used, the ISA run is commented
out from the next code block, and we just load a precomputed set of
modules that is distributed with the \Rpackage{eisa} package.
<<simple ISA,keep.source=TRUE>>=
thr.gene <- 2.7
thr.cond <- 1.4
set.seed(1) # to get the same results, always
# modules <- ISA(ALL, thr.gene=thr.gene, thr.cond=thr.cond)
data(ALLModulesSmall)
modules <- ALLModulesSmall
@ 

This first applies a non-specific filter to the data set and then runs
ISA from \Sexpr{formals(eisa:::ISA)$no.seeds} % $
random seeds (the default). See Section~\ref{sec:detailed-isa} if the
default parameters are not appropriate for you and need more control.

\section{Inspect the result}

The \Rfunction{ISA} function returns an \Rclass{ISAModules} object. By
typing in its name we can get a brief summary of the results:
<<type in name of ISAModules object>>=
modules
@

There are various other \Rclass{ISAModules} methods that help to
access the modules themselves and the ISA parameters that were used
for the run. 

Calling \Rfunction{length} on \variable{modules} returns the number of
ISA modules in the set, \Rfunction{dim} gives the dimension of the
input expression matrix: the number of features (after the filtering)
and the number of samples:
<<accessors 1>>=
length(modules)
dim(modules)
@ 

Functions \Rfunction{featureNames} and \Rfunction{sampleNames} return
the names of the features and samples, just like the functions with
the same name for an \Rclass{ExpressionSet}:
<<accessors 2>>=
featureNames(modules)[1:5]
sampleNames(modules)[1:5]
@ 

The \Rfunction{getNoFeatures} function returns a numeric vector, the
number of features (probesets in our case) in each module. Similarly,
\Rfunction{getNoSamples} returns a numeric vector, the number of
samples in each module. \Rfunction{pData} returns the
phenotype data of the expression set as a data frame. The
\Rfunction{getOrganism} function returns the scientific name of the
organism under study, \Rfunction{annotation} the name of the chip. For
the former the appropriate annotation package must be installed.
<<number of features and samples>>=
getNoFeatures(modules)
getNoSamples(modules)
colnames(pData(modules))
getOrganism(modules)
annotation(modules)
@ 

The double bracket indexing operator (`\verb+[[+') can be used to
select some modules from the complete set, the result is another,
smaller \Rclass{ISAModules} object. The following selects the first
five modules.
<<indexing>>=
modules[[1:5]]
@ 

The single bracket indexing operator can be used to restrict an
\Rclass{ISAModules} object to a subset of features and/or
samples. E.g. selecting all features that map to a gene on
chromosome~1 can be done with 
<<indexing 2>>=
chr <- get(paste(annotation(modules), sep="", "CHR"))
chr1features <- sapply(mget(featureNames(modules), chr), 
                 function(x) "1" %in% x)
modules[chr1features,]
@

Similarly, selecting all B-cell samples can be performed with
<<indexing 3>>=
modules[ ,grep("^B", pData(modules)$BT)]
@ 

\Rfunction{getFeatureNames} lists the probesets (more precisely,
the feature names coming from the \Rclass{ExpressionSet} object) in
the modules. It returns a list, here we just print the first entry.
<<list genes in modules>>=
getFeatureNames(modules)[[1]]
@

The \Rfunction{getSampleNames} function does the same for the samples.
Again, the sample names are taken from the \Rclass{ExpressionSet}
object that was passed to \Rfunction{ISA}:
<<list conditions in modules>>=
getSampleNames(modules)[[1]]
@ 

ISA biclustering is not binary, every feature (and
similarly, every sample) has a score between -1 and 1; the further the
score is from zero the stronger the association between the feature
(or sample) and the module. If two features both have scores with the
same sign, then they are correlated, if the sign of their scores are
opposite, then they are anti-correlated. You can query the scores of
the features with the \Rfunction{getFeatureScores} function, and
similarly, the \Rfunction{getSampleScores} function queries the sample
scores. You can supply the modules you want to query as an optional
argument:
<<query scores>>=
getFeatureScores(modules, 3)
getSampleScores(modules, 3)
@ 

You can also query the scores in a matrix form, that is probably
better if you need many or all of them at the same time. The
\Rfunction{getFeatureMatrix} and \Rfunction{getSampleMatrix} functions
are defined for this. The probes/samples that are not included in a
module will have a zero score by definition.
<<query all scores>>=
dim(getFeatureMatrix(modules))
dim(getSampleMatrix(modules))
@

Objects from the \Rclass{ISAModules} class store various information
about the ISA run and the convergence of the seeds. Information
associated with the individual seeds can be queried with the
\Rfunction{seedData} function, it returns a data frame, with as many
rows as the number of seeds and various seed-level information,
e.g. the number of iterations required for the seed to converge. See
the manual page of \Rfunction{ISA} for details.
<<seed data>>=
seedData(modules)
@

The \Rfunction{runData} function returns additional information about
the ISA run, see the \Rfunction{ISA} manual page for details.
<<run data>>=
runData(modules)
@ 

\section{Enrichment calculations}

The \Rpackage{eisa} package provides some functions to perform
enrichment tests for the gene sets corresponding to the ISA modules
against various databases. These tests are usually simplified and less
sophisticated versions than the ones in the \Rpackage{Category},
\Rpackage{GOstats} or \Rpackage{topGO} packages, but they are much
faster and this is important if we need to perform them for many
modules.

\subsection{Gene Ontology}

To perform enrichment analysis against the Gene Ontology database, all
you have to do is to supply your \Rclass{ISAModules} object to the
\Rfunction{ISAGO} function.
<<GO enrichment>>=
GO <- ISAGO(modules)
@ 

The \Rfunction{ISAGO} function requires the annotation package of the
chip, e.g. for the ALL data, the \Rpackage{hgu95av2.db} package is
required.

The \variable{GO} object is a list with three elements, these
correspond to the GO ontologies, they are: biological function,
cellular component and molecular function, in this order.
<<list GO result>>=
GO
@

We can see the number of categories tested, this is different for each
ontology, as they have different number of terms. The gene universe
size is also different, because it contains only genes that have at
least one annotation in the given category.

For extracting the results themselves, the \Rfunction{summary}
function can be used, this converts them to a simple data frame. A
$p$-value limit can be supplied to \Rfunction{summary}. Note, that
since \Rfunction{ISAGO} calculates enrichment for many gene sets
(i.e. for all biclusters), \Rfunction{summary} returns a list of data
frames, one for each bicluster. A table for the first module:
<<GO summary>>=
summary(GO$BP, p=0.001)[[1]][,-6]
@ 

We omitted the sixth column of the result, because it is very wide and
would look bad in this vignette. This column is called
\variable{drive} and lists the Entrez IDs of the genes that are in the
intersection of the bicluster and the GO category; or in other
words, the genes that drive the enrichment.
These genes can also be obtained with the
\Rfunction{geneIdsByCategory} function. The following returns the
genes in the first module and the third GO BP category. (The GO
categories are ordered according to the enrichment $p$-values, just
like in the output of \Rfunction{summary}.)
<<GO gene ids by category>>=
geneIdsByCategory(GO$BP)[[1]][[3]]
@ 

You can use the \Rpackage{GO.db} package to obtain more information
about the enriched GO categories.
<<GO info,cache=FALSE>>=
sigCategories(GO$BP)[[1]]
library(GO.db)
mget(na.omit(sigCategories(GO$BP)[[1]][1:3]), GOTERM)
@ 

In addition, the following functions are implemented to work on the
objects returned by \Rfunction{ISAGO}: \Rfunction{htmlReport},
\Rfunction{pvalues}, \Rfunction{geneCounts}, \Rfunction{oddsRatios},
\Rfunction{expectedCounts}, \Rfunction{universeCounts},
\Rfunction{universeMappedCount}, \Rfunction{geneMappedCount},
\Rfunction{geneIdUniverse}. These functions do essentially the same as
they counterparts for \Rclass{GOHyperGResult} objects, see the
documentation of the \Rpackage{GOstats} package. The only difference
is, that since here we are testing a list of gene sets (=biclusters),
they calculate the results for all gene sets and usually return lists.

\subsubsection{Multiple testing correction}

By default, the \Rfunction{ISAGO} function performs multiple testing
correction using the Holm method, this can be changed via the
\texttt{correction} and \texttt{correction.method} arguments. See the
manual page of the \Rfunction{ISAGO} function for details, and also
the \Rfunction{p.adjust} function for the possible multiple testing
correction schemes.

%\subsection{Transcription regulation from the DBD database}

%TODO

\section{How ISA works}%
\label{sec:isa}

\subsection{ISA iteration}

ISA works in an iterative way. For an $E (m\times n)$
input matrix it starts from a seed vector $r_0$, which is
typically a sparse 0/1 vector of length $m$. The non-zero elements in
the seed vector define a set of genes in $E$. Then the transposed of
$E$, $E'$ is multiplied by $r_0$ and the result is thresholded.

The thresholding is an important step of the ISA, without
thresholding ISA would be equivalent to a (not too effective)
numerical singular value decomposition (SVD) algorithm. Currently
thresholding is done by calculating the mean and standard deviation
of the vector and keeping only elements that are further than a
given number of standard deviations from the mean. Using the
``direction'' parameter, one can keep values that are
(a) significantly higher (``up''); (b) lower (``down'') than the
mean; or (c) both (``updown'').

The thresholded vector $c_0$ is the (sample)
\emph{signature} of $r_0$. Then the (gene) signature of
$c_0$ is calculated, $E$ is multiplied by $c_0$ and then thresholded
to get $r_1$.

This iteration is performed until it converges, i.e. $r_{i-1}$
and $r_i$ are \emph{close}, and $c_{i-1}$ and
$c_i$ are also close. The convergence criteria,
i.e. what \emph{close} means, is by default defined by high Pearson
correlation.

It is very possible that the ISA finds the same module more than once;
two or more seeds might converge to the same module. The function
\Rfunction{ISAUnique} eliminates every module from the result of 
\Rfunction{ISAIterate} that is very similar (in terms of
Pearson correlation) to the one that was already found before.

It might be also apparent from the description of ISA, that the
biclusters are soft, i.e. they might have an overlap in their genes,
samples, or both. It is also possible that some genes and/or
samples of the input matrix are not found to be part of any ISA
biclusters. Depending on the stringency parameters in the
thresholding (i.e. how far the values should be from the mean), it
might even happen that ISA does not find any biclusters.

\subsection{Parameters}

The two main parameters of ISA are the two thresholds (one for the
genes and one for the samples). They basically define the stringency of
the modules. If the gene threshold is high, then the modules will have
very similar genes. If it is mild, then modules will be bigger, with
less similar genes than in the first case. The same applies to the
sample threshold and the samples of the modules.

\subsection{Random seeding and smart seeding}

By default (i.e. if the \Rfunction{ISA} function is used) the ISA is
performed from random sparse starting seeds, generated by the
\Rfunction{generate.seeds} function. This way the algorithm is 
completely unsupervised, but also stochastic: it might give different
results for different runs.

It is possible to use non-random seeds as well. If you have some
knowledge about the data or are interested in a particular subset of
genes/samples, then you can feed in your seeds into the
\Rfunction{ISAIterate} function directly. In this case the
algorithm is deterministic, for the same seed you will always get the
same results. Using smart (i.e. non-random) seeds can be considered as
a semi-supervised approach. We show an example of using smart seeds in
Section~\ref{sec:detailed-isa}. 

\subsection{Normalization}

Using in silico data we observed that ISA has the best performance if the
input matrix is normalized (see \Rfunction{ISANormalize}). The
normalization produces two matrices: $E_r$ and
$E_c$. $E_r$ is calculated by transposing $E$ and
centering and scaling its expression values for each sample (see the
\Rfunction{scale} R function). $E_c$ is calculated by centering and
scaling the genes of $E$. $E_r$ is used to calculate the sample
signature of genes and $E_c$ is used to calculate the gene signature
of the samples.

It is possible to use another normalization, or not to use
normalization at all; the user has to construct an
\Rclass{ISAExpressionSet} object containing the three matrices
corresponding to the raw data, the gene-wise normalized data and the
sample-wise normalized data. This object can be passed to the
\Rfunction{ISAIterate} function. The matrices are not required to be
different, the user can supply the raw data matrix three times, if
desired.

\subsection{Gene and sample scores}

In addition to finding biclusters in the input matrix, the ISA also
assigns scores to the genes and samples, separately for each
module. The scores are between minus one and one and they are by
definition zero for the genes/samples that are not included in the
module. For the non-zero entries, the further the score of a
gene/samples is from zero, the stronger the association between the
gene/sample and the module. If the signs of two genes/samples are the
same, then they are correlated, if they have opposite signs, then they
are anti-correlated. 

\section{Bicluster coherence and robustness measures}

\subsection{Coherence}

Madeira and Oliviera\cite{madeira04} define various coherence scores
for biclusters, these measure how well the rows and or columns are
correlated. It is possible to use these measures for ISA as well,
after converting the output of ISA to a \Rclass{biclust} object. 
We use the \texttt{Bc} object that was created in Section~\ref{sec:biclust}. 
Here are the measures for the first bicluster:
<<coherence>>=
library(biclust)
Bc <- as(modules, "Biclust")
constantVariance(exprs(ALL), Bc, number=1)
additiveVariance(exprs(ALL), Bc, number=1)
multiplicativeVariance(exprs(ALL), Bc, number=1)
signVariance(exprs(ALL), Bc, number=1)
@

You can use \Rfunction{sapply} to perform the calculation for many or
all modules, e.g. for this data set `constant variance' and `additive
variance' are not the same:
<<coherence all>>=
cv <- sapply(seq_len(Bc@Number), 
             function(x) constantVariance(exprs(ALL), Bc, number=x))
av <- sapply(seq_len(Bc@Number), 
             function(x) additiveVariance(exprs(ALL), Bc, number=x))
cor(av, cv)
@
Please see the manual pages of these functions and the paper cited
above for more details.

\subsection{Robustness}

The \Rpackage{eisa} package uses a measure that is related to
coherence; it is called robustness. Robustness is a generalization of
the singular value of a matrix. If there were no thresholding during
the ISA iteration, then ISA would be equivalent to a numerical method
for singular value decomposition and robustness would be the
same the principal singular value of the input matrix. 

If the \Rfunction{ISA} function was used to find the transcription
modules, then the robustness measure is used automatically to filter the results.
This is done by first scrambling the input matrix and then running ISA
on it. As ISA is an unsupervised algorithm it usually finds some
(although less and smaller) modules even in such a scrambled data
set. Then the robustness scores are calculated for the proper and the
scrambled modules and only (proper) modules that have a higher score
than the highest scrambled module are kept. The robustness scores are
stored in the seed data during this process, so you can check them
later:
<<robustness>>=
seedData(modules)$rob
@

\section{The \Rpackage{isa2} and \Rpackage{eisa} packages}%
\label{sec:isapackages}

ISA and its companion functions for visualization, functional
enrichment calculation, etc. are distributed in two separate R packages,
\Rpackage{isa2} and \Rpackage{eisa}.

\Rpackage{isa2} contains the implementation of ISA itself, and
\Rpackage{eisa} specifically deals with supplying expression data to
\Rpackage{isa2} and visualizing the results.

If you analyze gene expression data, then we suggest using the
interface provided in the \Rpackage{eisa} package. For other data, use
the \Rpackage{isa2} package directly.

\section{Finer control over ISA parameters}%
\label{sec:detailed-isa}

The \Rfunction{ISA} function takes care of all steps performed in
a modular study, and for each step it uses parameters, that work reasonably
well. In some cases, however, one wants to access these steps
individually, to use custom parameters instead of the defaults.

In this section, we will still use the acute lymphoblastic leukemia
gene expression data from the \Rpackage{ALL} package. 

\subsection{Non-specific filtering}

The first step of the analysis typically involves non-specific
filtering of the probesets. The aim is to eliminate the probesets that
do not show variation across the samples, as they only contribute
noise to the data. 

By default (i.e. if the \Rfunction{ISA} function is called) this is
performed using the \Rpackage{genefilter} package, and the default
filter is based on the inter-quantile ratio of the probesets'
expression values, a robust measure of variance. 

If other filters are desired, then these can be implemented by using
the functions of the \Rpackage{genefilter} package directly.
Possible filtering techniques include using the AffyMetrix
present/absent calls produced by the \Rfunction{mas5calls} function of
the \Rpackage{affy} package, but this requires the raw data, so in
this vignette we use a simple method based on variance and minimum
expression value: only probesets that have a variance of at least
\variable{varLimit} and that have at least \variable{kLimit} samples
with expression values over \variable{ALimit} are kept.
<<filtering>>=
library(genefilter)
varLimit <- 0.5
kLimit <- 4
ALimit <- 5
flist <- filterfun(function(x) var(x)>varLimit, kOverA(kLimit,ALimit))
ALL.filt <- ALL[genefilter(ALL, flist), ]
@ 

The original expression set had \Sexpr{nrow(ALL)} features, the
filtered one has only \Sexpr{nrow(ALL.filt)}.


\subsection{Entrez Id matching}

In this step we match the probesets to Entrez identifiers and remove
the ones that don't map to any Entrez gene.
<<Entrez,cache=FALSE>>=
ann <- annotation(ALL.filt)
library(paste(ann, sep=".", "db"), character.only=TRUE)
ENTREZ <- get( paste(ann, sep="", "ENTREZID") )
EntrezIds <- mget(featureNames(ALL.filt), ENTREZ)
keep <- sapply(EntrezIds, function(x) length(x) >= 1 && !is.na(x))
ALL.filt.2 <- ALL.filt[keep,]
@ 

To reduce ambiguity in the interpretation of the results, we might
also want to keep only single probeset for each Entrez gene. The
following code snipplet keeps the probeset with the highest variance.
<<Entrez unique>>=
vari <- apply(exprs(ALL.filt.2), 1, var)
larg <- findLargest(featureNames(ALL.filt.2), vari, data=annotation(ALL.filt.2))
ALL.filt.3 <- ALL.filt.2[larg,]
@ 

\subsection{Normalizing the data}

The ISA works best, if the expression matrix is scaled and
centered. In fact, the two sub-steps of an ISA step require expression
matrices that are normalized differently. The
\Rfunction{ISANormalize} function can be used to calculate the
normalized expression matrices; it returns an \Rclass{ISAExpressionSet}
object. This is a subclass of \Rclass{ExpressionSet}, and contains
three expression matrices: the original raw expression, the row-wise
(=gene-wise) normalized and the column-wise (=sample-wise) normalized
expression matrix. The normalized expression matrices can be queried
with the \Rfunction{featExprs} and \Rfunction{sampExprs} functions.
<<normed>>=
ALL.normed <- ISANormalize(ALL.filt.3)
ls(assayData(ALL.normed))
dim(featExprs(ALL.normed))
dim(sampExprs(ALL.normed))
@ 

\subsection{Generating starting seeds for the ISA}

The ISA is an iterative algorithm that starts with a set of input
seeds. An input seed is basically a set of probesets and the ISA stepwise
refines this set by 1) including other probesets in the set that are
coexpressed with the input probesets and 2) removing probesets from it
that are not coexpressed with the rest of the input set.

The \Rfunction{generate.seeds} function generates a set of random
seeds (i.e. a set of random gene sets). See its documentation if you
need to change the sparsity of the seeds.

<<seeds>>=
set.seed(3)
random.seeds <- generate.seeds(length=nrow(ALL.normed), count=100)
@ 

In addition to random seeds, it is possible to start the ISA iteration
from ``educated'' seeds, i.e. gene sets the user is interested in, or
a set of samples that are supposed to have coexpressed genes. We
create another set of starting seeds here, based on the type of acute
lymphoblastic leukemia: ``B'', ``B1'', ``B2'', ``B3'', ``B4'' or
``T'', ``T1'', ``T2'', ``T3'' and ``T4''.
<<smart seeds>>=
type <- as.character(pData(ALL.normed)$BT)
ss1 <- ifelse(grepl("^B", type), -1, 1)
ss2 <- ifelse(grepl("^B1", type), 1, 0)
ss3 <- ifelse(grepl("^B2", type), 1, 0)
ss4 <- ifelse(grepl("^B3", type), 1, 0)
ss5 <- ifelse(grepl("^B4", type), 1, 0)
ss6 <- ifelse(grepl("^T1", type), 1, 0)
ss7 <- ifelse(grepl("^T2", type), 1, 0)
ss8 <- ifelse(grepl("^T3", type), 1, 0)
ss9 <- ifelse(grepl("^T4", type), 1, 0)
smart.seeds <- cbind(ss1, ss2, ss3, ss4, ss5, ss6, ss7, ss8, ss9)
@

The \variable{ss1} seed includes all samples, but their sign is
opposite for B-cell leukemia samples and T-cell samples. This way ISA
is looking for sets of genes that are differently regulated in these
two groups of samples. \variable{ss2} contains only B1 type samples,
so here we look for genes that are specific to this variant of the
disease. The other seeds are similar, for the other subtypes.

\subsection{Performing the ISA iteration}

We perform the ISA iterations for our two sets of seeds separately.
The two threshold parameters we use here were chosen after some
experimentation; these result modules of the ``right'' size.
<<iteration>>=
modules1 <- ISAIterate(ALL.normed, feature.seeds=random.seeds, 
                        thr.feat=1.5, thr.samp=1.8, convergence="cor")
modules2 <- ISAIterate(ALL.normed, sample.seeds=smart.seeds,
                        thr.feat=1.5, thr.samp=1.8, convergence="cor")
@

\subsection{Dropping non-unique modules}

\Rfunction{ISAIterate} returns the same number of modules as the
number of input seeds; these are, however, not always meaningful,
the input seeds can converge to an all-zero
vector, or occasionally they may not converge at all. It is also
possible that two or more input seeds converge to the same module. 

The \Rfunction{ISAUnique} function eliminates the all-zero or
non-convergent input seeds and keeps only one instance of the
duplicated ones.
<<unique>>=
modules1.unique <- ISAUnique(ALL.normed, modules1)
modules2.unique <- ISAUnique(ALL.normed, modules2)
length(modules1.unique)
length(modules2.unique)
@ 

\Sexpr{length(modules1.unique)} modules were kept for the first set of
seeds and \Sexpr{length(modules2.unique)} for the second set.

\subsection{Dropping non-robust modules}

The \Rfunction{ISAFilterRobust} function filters a set of modules by
running ISA with the same parameters on the scrambled data set and then
calculating a robustness score, both for the real modules and the ones
from the scrambled data. The highest robustness score obtained from
the scrambled data is used as a threshold to filter the real modules.
<<robust>>=
modules1.robust <- ISAFilterRobust(ALL.normed, modules1.unique)
modules2.robust <- ISAFilterRobust(ALL.normed, modules2.unique)
length(modules1.robust)
length(modules2.robust)
@

We still have \Sexpr{length(modules1.robust)} modules for
the first set of seeds and \Sexpr{length(modules2.unique)} for the
second set.

\section{More information}

For more information about the ISA, please see the references
below. The ISA homepage at
\url{http://www.unil.ch/cbg/homepage/software.html} has example data
sets, and all ISA related tutorials and papers.

\section{Session information}

The version number of R and packages loaded for generating this
vignette were:

<<sessioninfo,results=tex,echo=FALSE>>=
toLatex(sessionInfo())
@ 

\bibliographystyle{apalike}
\bibliography{EISA}

\end{document}