---
title: "Introduction to HiCParser"
author: 
  - name: Elise Maigné
    affiliation:
    - INRAE, MIAT
    email: elise.maigne@inrae.fr
  - name: Matthias Zytnicki
    affiliation:
    - INRAE, MIAT
    email: matthias.zytnicki@inrae.fr
output: 
  BiocStyle::html_document:
    self_contained: yes
    toc: true
    toc_float: true
    toc_depth: 2
    code_folding: show
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('HiCParser')`"
vignette: >
  %\VignetteIndexEntry{Introduction to HiCParser}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL,
    warning = FALSE
)
```

# Basics

## Required knowledge

`r BiocStyle::Biocpkg("HiCParser")` is based on other packages and in 
particular in those that have implemented the infrastructure needed for 
dealing with HiC data with several replicates and conditions. Is provides 
several parsers, for several HiC data standard format to import them 
into R in a `r BiocStyle::Biocpkg("InteractionSet")` object.

## Citing `HiCParser`

We hope that `r BiocStyle::Biocpkg("HiCParser")` will be useful for your 
research. 
Please use the following information to cite the package and the overall 
approach. Thank you!

```{r "citation"}
## Citation info
citation("HiCParser")
```

# Start using `HiCParser`

```{r "start", message=FALSE}
library("HiCParser")
```

`HiCParser` can import Hi-C data sets in various different formats:
- Cooler `.cool` or `.mcool` files.
- Juicer `.hic` files.
- HiC-Pro `.matrix` and `.bed` files.
- Tabular (`.tsv`, `.csv`, ...) files.

## Cooler files

### `.cool` files

To load `.cool` files generated by [Cooler][cooler-documentation]
[@cooler]:

```{r coolFormat}
# Path to each file
paths <- c(
    "path/to/condition-1.replicate-1.cool",
    "path/to/condition-1.replicate-2.cool",
    "path/to/condition-1.replicate-3.cool",
    "path/to/condition-2.replicate-1.cool",
    "path/to/condition-2.replicate-2.cool",
    "path/to/condition-2.replicate-3.cool"
)

# For the sake of the example, we will use the same file, several times
paths <- rep(
    system.file("extdata",
        "hicsample_21.cool",
        package = "HiCParser"
    ),
    6
)

# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)

# Instantiation of data set
hic.experiment <- parseCool(
    paths,
    conditions = conditions,
    replicates = replicates
)
```


### `.mcool` files

To load `.mcool` files generated by [Cooler][cooler-documentation]
[@cooler]:

```{r mcoolFormat}
# Path to each file
paths <- c(
    "path/to/condition-1.replicate-1.mcool",
    "path/to/condition-1.replicate-2.mcool",
    "path/to/condition-1.replicate-3.mcool",
    "path/to/condition-2.replicate-1.mcool",
    "path/to/condition-2.replicate-2.mcool",
    "path/to/condition-2.replicate-3.mcool"
)

# For the sake of the example, we will use the same file, several times
paths <- rep(
    system.file("extdata",
        "hicsample_21.mcool",
        package = "HiCParser"
    ),
    6
)

# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)

# mcool files can store several resolutions.
# We will mention the one we need.
binSize <- 5000000

# Instantiation of data set
# The same function "parseCool" is used for cool and mcool files
hic.experiment <- parseCool(
    paths,
    conditions = conditions,
    replicates = replicates,
    binSize = binSize # Specified for .mcool files.
)
```

## hic files

To load `.hic` files generated by [Juicer][juicer-documentation] [@juicer]:

```{r hicFormat}
# Path to each file
paths <- c(
    "path/to/condition-1.replicate-1.hic",
    "path/to/condition-1.replicate-2.hic",
    "path/to/condition-2.replicate-1.hic",
    "path/to/condition-2.replicate-2.hic",
    "path/to/condition-3.replicate-1.hic"
)

# For the sake of the example, we will use the same file, several times
paths <- rep(
    system.file("extdata",
        "hicsample_21.hic",
        package = "HiCParser"
    ),
    6
)

# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)

# hic files can store several resolutions.
# We will mention the one we need.
binSize <- 5000000

# Instantiation of data set
hic.experiment <- parseHiC(
    paths,
    conditions = conditions,
    replicates = replicates,
    binSize = binSize
)
```
Currently, `HiCParser` supports the hic format up to the version 9.

## HiC-Pro files

To load `.matrix` and `.bed` files generated by [HiC-Pro][hicpro-documentation]
[@hicpro]:

```{r hicproFormat}
# Path to each matrix file
matrixPaths <- c(
    "path/to/condition-1.replicate-1.matrix",
    "path/to/condition-1.replicate-2.matrix",
    "path/to/condition-1.replicate-3.matrix",
    "path/to/condition-2.replicate-1.matrix",
    "path/to/condition-2.replicate-2.matrix",
    "path/to/condition-2.replicate-3.matrix"
)

# For the sake of the example, we will use the same file, several times
matrixPaths <- rep(
    system.file("extdata",
        "hicsample_21.matrix",
        package = "HiCParser"
    ),
    6
)

# Path to each bed file
bedPaths <- c(
    "path/to/condition-1.replicate-1.bed",
    "path/to/condition-1.replicate-2.bed",
    "path/to/condition-1.replicate-3.bed",
    "path/to/condition-2.replicate-1.bed",
    "path/to/condition-2.replicate-2.bed",
    "path/to/condition-2.replicate-3.bed"
)

# Alternatively, if the same bed file is used, we can provide it only once
bedPaths <- system.file("extdata",
    "hicsample_21.bed",
    package = "HiCParser"
)

# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)

# Instantiation of data set
hic.experiment <- parseHiCPro(
    matrixPaths = matrixPaths,
    bedPaths = bedPaths,
    conditions = conditions,
    replicates = replicates
)
```

## Tabular files

A tabular file is a tab-separated multi-replicate sparse matrix with a header:

```
chromosome    position 1    position 2    C1.R1    C1.R2    C1.R3    ...
Y             1500000       7500000       145      184      72       ...
```

The number of interactions between `position 1` and `position 2` of
`chromosome` are reported in each `condition.replicate` column. There is no
limit to the number of conditions and replicates.

To load Hi-C data in this format:

```{r tabFormat}
hic.experiment <- parseTabular(
    system.file("extdata",
        "hicsample_21.tsv",
        package = "HiCParser"
    ),
    sep = "\t"
)
```

# InteractionSet format


# Output : InteractionSet format

The output is a `r BiocStyle::Biocpkg("InteractionSet")`.
This object can store one or several samples.

Please read the documentation associated with the 
`r BiocStyle::Biocpkg("InteractionSet")` package to known more about this 
format.

```{r}
library("HiCParser")
hicFilePath <- system.file("extdata", "hicsample_21.hic", package = "HiCParser")
hic.experiment <- parseHiC(
    paths = rep(hicFilePath, 6),
    binSize = 5000000,
    conditions = rep(seq(2), each = 3),
    replicates = rep(seq(3), 2)
)
hic.experiment
```

The conditions and replicates are reported in the `colData` slot : 

```{r}
SummarizedExperiment::colData(hic.experiment)
```

They corresponds to columns of the `assays` matrix (containing 
interactions values):

```{r}
head(SummarizedExperiment::assay(hic.experiment))
```

The positions of interactions are in the `interactions` slot of the object:

```{r}
InteractionSet::interactions(hic.experiment)
```

## Additional utils functions

A function `mergeInteractionSet` to merge `InteractionSet` objects, 
from the same experiment (for differents replicates or conditions).

It merges the the data containing bins of interactions and fill the assays 
matrix accordingly, returning an assays matrix with several columns.

The object returned by the function is an `InteractionSet`.

Here is a fictitious example:

```{r}
path <- system.file("extdata", "hicsample_21.cool", package = "HiCParser")
object1 <- parseCool(path, conditions = 1, replicates = 1)
# Creating an object with a different condition
object2 <- parseCool(path, conditions = 2, replicates = 1)
```

The merged object:

```{r}
objectMerged <- mergeInteractionSet(object1, object2)
SummarizedExperiment::colData(objectMerged)
head(SummarizedExperiment::assay(objectMerged))
```

# Reproducibility

This package was developed using `r BiocStyle::Biocpkg("biocthis")`.

`R` session information.

```{r reproduce3, echo=FALSE}
## Session info
library("sessioninfo")
options(width = 120)
session_info()
```


# Bibliography

Lun ATL, Perry M and Ing-Simmons E (2016). 
Infrastructure for genomic interactions: Bioconductor classes for Hi-C, 
ChIA-PET and related experiments. F1000Res. 5, 950