\name{IntervalTree-class}
\docType{class}
\alias{IntervalTree-class}

% constructor
\alias{IntervalTree}

% coercion
\alias{coerce,IRanges,IntervalTree-method}
\alias{coerce,Ranges,IntervalTree-method}
\alias{coerce,IntervalTree,IRanges-method}

% accessors
\alias{length,IntervalTree-method}

% overlap computation
\alias{overlap}
\alias{overlap,IntervalTree,Ranges-method}
\alias{overlap,Ranges,Ranges-method}
\alias{overlap,Ranges,missing-method}
\alias{\%in\%,Ranges,Ranges-method}

\title{Interval Search Trees}
\description{
  An \code{IntervalTree} instance is an external
  representation of ranges (i.e. it is derived from
  \code{\linkS4class{XRanges}}) that is optimized for overlap queries.
}
\details{
  A common type of query that arises when working with intervals is finding
  which intervals in one set overlap those in another. An efficient
  family of algorithms for answering such queries is known as the
  Interval Tree. This class makes use of the augmented tree algorithm
  from the reference below, but heavily adapts it for the use case of
  large, sorted query sets.

  The usual workflow is to create an \code{IntervalTree} using the
  constructor described below and then perform overlap queries using the
  \code{overlap} method. The results of the query are returned as a
  \code{\linkS4class{RangesMatching}} object. 
}

\section{Constructor}{
  \describe{
    \item{}{IntervalTree(ranges): Creates an \code{IntervalTree} from the
      ranges in \code{ranges}, an object coercible to
      \code{IntervalTree}, such as an \code{\linkS4class{IRanges}} instance.
    }
  }
}

\section{Finding Overlaps}{
  This main purpose of the interval tree is to optimize the search for
  ranges overlapping those in a query set. The interface for this
  operation is the \code{\link{overlap}} function.

  \describe{
    \item{}{
      \code{overlap(object, query = object, maxgap = 0, multiple = TRUE)}:

      Find the intervals in \code{query}, a \code{Ranges} instance, that
      overlap with the intervals \code{object}, an \code{IntervalTree}
      or, for convenience, a \code{Ranges} coercible to a
      \code{IntervalTree}. If \code{query} is omitted, \code{object} is
      queried against itself. Intervals with a separation of \code{maxgap}
      or less are considered to be overlapping. \code{maxgap} should be
      a scalar, non-negative, non-NA number. When \code{multiple} (a
      scalar non-NA logical) is \code{TRUE}, the results are returned as a
      \code{\linkS4class{RangesMatching}} object.

      If \code{multiple} is \code{FALSE}, at most one overlapping
      interval in \code{object} is returned for each interval in
      \code{query}. The matchings are returned as an integer vector of
      length \code{length(query)}, with \code{NA} indicating intervals
      that did not overlap any intervals in \code{object}. This is
      analogous to the return value of the \code{\link{match}} function.
    }
    \item{}{
      \code{x \%in\% table}:
      Shortcut for finding the ranges in \code{x} that overlap any of
      the ranges in \code{table}. Both \code{x} and \code{table} should
      be \code{Ranges} instances. The result is a \code{logical} vector
      of the same length as \code{x}.
    }
  }
  
}

\section{Coercion}{
  \describe{
    \item{}{\code{as(from, "IRanges")}: Imports the ranges in
      \code{from}, an \code{IntervalTree}, to an
      \code{\linkS4class{IRanges}}.}
    \item{}{\code{as(from, "IntervalTree")}: Constructs an
      \code{IntervalTree} representing \code{from}, a \code{Ranges}
      instance that is coercible to \code{IRanges}.
    }
  }
}

\section{Accessors}{
  \describe{
    \item{}{\code{length(x)}: Gets the number of ranges stored in the
      tree. This is a fast operation that does not bring the ranges into
      R.
    }
  }
}

\section{Notes on Time Complexity}{
  The cost of constructing an instance of the interval tree is a
  \code{O(n*lg(n))}, which makes it about as fast as other types of
  overlap query algorithms based on sorting. The good news is that the
  tree need only be built once per subject; this is useful in situations
  of frequent querying. Also, in this implementation the data is stored
  outside of R, avoiding needless copying. Of course, external storage
  is not always convenient, so it is possible to coerce the tree to an
  instance of \code{\linkS4class{IRanges}} (see the Coercion section).

  For the query operation, the running time is based on the query size
  \code{m} and the average number of hits per query \code{k}. The output
  size is then \code{max(mk,m)}, but we abbreviate this as
  \code{mk}. Note that when the \code{multiple} parameter is set to
  \code{FALSE}, \code{k} is fixed to 1 and drops out of this
  analysis. We also assume here that the query is sorted by start
  position, which is an assertion of the \code{overlap} method.
  
  An upper bound for finding overlaps is
  \code{O(min(mk*lg(n),n+mk))}. The fastest interval tree algorithm
  known is bounded by \code{O(min(m*lg(n),n)+mk)} but is a lot more
  complicated and involves two auxillary trees. The lower bound is
  \code{Omega(lg(n)+mk)}, which is almost the same as for returning
  the answer, \code{Omega(mk)}. The average is of course somewhere in
  between.

  This analysis informs the choice of which set of ranges to process
  into a tree, i.e. assigning one to be the subject and the other to be
  the query. Note that if \code{m > n}, then the running time is
  \code{O(m)}, and the total operation of complexity \code{O(n*lg(n) +
  m)} is better than if \code{m} and \code{n} were exchanged. Thus, for
  once-off operations, it is often most efficient to choose the smaller
  set to become the tree (but \code{k} also affects this). This is
  reinforced by the realization that if \code{mk} is about the same in
  either direction, the running time depends only on \code{n}, which
  should be minimized. Even in cases where a tree has already been
  constructed for one of the sets, it can be more efficient to build a
  new tree when the existing tree of size \code{n} is much larger than
  the query set of size \code{m}, roughly when \code{n > m*lg(n)}.
}

\references{Interval tree algorithm from: Cormen, Thomas H.; Leiserson,
  Charles E.; Rivest, Ronald L.; Stein, Clifford. Introduction to
  Algorithms, second edition, MIT Press and McGraw-Hill. ISBN
  0-262-53196-8}

\author{Michael Lawrence}
\seealso{
  \code{\linkS4class{XRanges}}, the parent of this class,
  \code{\linkS4class{RangesMatching}}, the result of an overlap query.
}
\examples{
  query <- IRanges(c(1, 4, 9), c(5, 7, 10))
  subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
  tree <- IntervalTree(subject)

  ## at most one hit per query
  overlap(tree, query, multiple = FALSE) # c(2, NA, 3)

  ## allow multiple hits
  overlap(tree, query)

  ## overlap as long as distance <= 1
  overlap(tree, query, maxgap = 1)

  ## shortcut
  overlap(subject, query)

  ## query and subject are easily interchangeable
  query <- IRanges(c(1, 4, 9), c(5, 7, 10))
  subject <- IRanges(c(2, 2), c(5, 4))
  tree <- IntervalTree(subject)
  t(overlap(tree, query))
  # the same as:
  overlap(query, subject)

  ## one Ranges with itself
  overlap(query)
}

\keyword{classes}
\keyword{methods}