---
title: "scHiCcompare Vignette"
author:
- name: My Nguyen
  affiliation:
  - &1 Department of Biostatistics, Virginia Commonwealth University,
   Richmond, VA
- name: Mikhail Dozmorov
  affiliation:
  - *1
date: '`r format(Sys.Date(), "%B %e, %Y")`'
package: scHiCcompare
output:
  BiocStyle::html_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{Chromatin Differential Analysis of scHiC -scHiCcompare Vignette}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include=FALSE}
options(width = 1000)
knitr::opts_chunk$set(echo = TRUE)
# Set CRAN mirror
options(repos = c(CRAN = "https://cran.rstudio.com/"))

# BiocManager::install("BiocStyle")
library(BiocStyle)
```

# Introduction

`r BiocStyle::Biocpkg("scHiCcompare")` is designed for the imputation, 
joint normalization, and detection of differential chromatin interactions 
between two groups of chromosome-specific single-cell Hi-C datasets (scHi-C). 
The groups can be pre-defined based on biological conditions or created by 
clustering cells according to their chromatin interaction patterns. Clustering 
can be performed using methods like 
[Higashi](https://github.com/ma-compbio/Higashi), [scHiCcluster](https://github.com/zhoujt1994/scHiCluster) methods, etc. 

`r BiocStyle::Biocpkg("scHiCcompare")` works with processed Hi-C data, 
specifically chromosome-specific chromatin interaction matrices, and accepts 
five-column tab-separated text files in a sparse matrix format. 

The package provides two key functionalities:

- Imputation of single-cell Hi-C data by random forest model with pooling 
technique
- Differential analysis to identify differences in chromatin interactions 
between groups.

# Installation

```{r, message=FALSE, warning=FALSE, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("scHiCcompare")

# For the latest version install from GitHub
# devtools::install_github("dozmorovlab/scHiCcompare")
```

```{r,  message=FALSE, warning=FALSE, echo=FALSE}
# devtools::install_github("dozmorovlab/scHiCcompare", force = T)
```

```{r, message=FALSE, warning=FALSE}
library(scHiCcompare)
library(HiCcompare)
library(tidyr)
library(ggplot2)
library(gridExtra)
library(lattice)
library(data.table)
library(mclust)
```

# scHiCcompare function

## Overview

`scHiCcompare()` function conducts a differential analysis workflow, including
imputation and normalization, using chromosome-specific chromatin interaction
matrices split into two conditions (or two cell type groups). 

 - **Imputation** - First, scHiC data in each group can optionally undergo
 `imputation` to address data sparsity. As resolution increases, the percentage
 of '0' values or missing data also rises drastically. To address this,
 scHiCompare applies the random forest (`RF`) imputation method on each genomic
 distance (`no pooling`) or pooled bands with the option of a pooling technique.
 There are two pooling strategies:

    1. `Progressive pooling`, where each subsequent band combines interaction
    frequencies (IFs) within a linearly increasing range of distances.
    
    2. `Fibonacci pooling`, where each subsequent band combines IFs within
    increasing distance ranges, and this increase follows the Fibonacci
    sequence.
    
```{r, echo=FALSE, out.width="100%", fig.cap="An example of how different pooling styles assign genomic distance into each band", fig.align="center"}
knitr::include_graphics("Pooling_Example.png")
```


Due to extreme sparsity at higher genomic distances and randomness of long-range
interactions, scHiCcompare, by default, workflow focuses on a distance range of
1 to 10MB (`main.Distance`).  If any pool band falls outside this range and has
a percentage of missing values exceeding the specified threshold (default
missPerc.threshold is 95%), the missing interaction frequencies (IFs) values 
are imputed using the mean IF values from the available data within that pool 
band.

 - **Pseudo-bulk scHi-C** - Second, imputed single-cell Hi-C data in each group 
 are transformed into group-specific `pseudo-bulk` scHi-C matrices by summing 
 all group-specific single-cell Hi-C matrices.

 - **Normalization** - After obtaining two group-specific pseudo-bulk matrices, 
 `normalization` removes global and local biases between two pseudo-bulk 
 matrices. The `r BiocStyle::Biocpkg("scHiCcompare")` workflow applies a LOESS 
 regression model from `r BiocStyle::Biocpkg("HiCcompare")` to jointly 
 normalize two imputed pseudo-bulk matrices. Briefly,
    + The data is visualized using a mean difference (MD) plot, where \( M \),
    **M**ean difference (calculated as \( M = \log_2(IF_2 / IF_1) \), Y-axis) 
    is plotted against \( D \), the **D**istance between interacting regions 
    (X-axis). In this plot, a LOESS regression curve is fitted to adjust the 
    interaction frequencies of the two condition groups, centering the \( M \) 
    differences around a baseline of \( M = 0 \).

 - **Differential analysis** - `differential analysis` is performed on the 
 processed pseudo-bulk matrices to identify differential chromatin interactions 
 between the two cell types or conditions. This involves separating the
 normalized log fold changes of interaction frequencies (the M-values) into 
 difference and non-difference groups. This analysis is performed on a 
 per-distance basis.
    + The non-difference group is assumed to follow a normal distribution 
    centered around 0, identified using a Gaussian Mixture Model (GMM). The 
    difference group consists of log fold changes from other distributions that 
    deviate from the non-difference group's normal distribution. To improve 
    precision, a log fold change threshold `fprControl.logfc` is applied to 
    values in the difference group, excluding low log fold changes.
    + When the differences are not large enough to form distinct distributions, 
    the default differential analysis of `r BiocStyle::Biocpkg("HiCcompare")` 
    is used.

## Input 

To use scHiCcompare, you'll need to define two groups of cells to compare and 
save cell-specific scHi-C data (individual files in **.txt** format) in two 
folders.

Each cell-specific scHi-C **.txt** file should be formatted as modified sparse 
upper triangular matrices in R, which consist of five columns 
(chr1, start1, chr2, start2, IF). Since the full matrix of chromatin 
interactions is symmetric, only the upper triangular portion, including the 
diagonal and excluding any 0, is stored in a sparse matrix format. The required 
sparse matrix format of each single-cell Hi-C is: 

- "chr1" - Chromosome of the first region. 
- “start1” - a start coordinate (in bp) of the first region.
- "chr2" - Chromosome of the second region.
- “start2” - a start coordinate (in bp) of the second region.
- "IF" - the interaction frequency between 2 two regions (IFs).

The '.txt' files need to be saved in tab-separated columns and no row names, 
column names, or quotes around character strings with the example format below.


```{r, echo=FALSE}
data("ODC.bandnorm_chr20_1")
```

```{r, echo=FALSE}
names(ODC.bandnorm_chr20_1) <- c("chr1", "start1", "chr2", "start2", "IF")

head(ODC.bandnorm_chr20_1)
```

To run `scHiCcompare()`, you need two folders with condition-specific scHiC 
'.txt' files. The condition-specific groups of cells should be pre-defined 
based on criteria such as experimental conditions, clustering results, or 
biological characteristics. 

In section [Others](#others), we shows examples of steps to [download](#download-schic-data) and [import](#import-schic-data-in-R) scHi-C data into R. User can refer to it for more information.


### Prepare input folders

Here is an example workflow using scHiC human brain datasets (Lee et al., 2019)
with ODC and MG cell types at chromosome 20 with a 1MB resolution.

For the following example sections, we will load samples of 10 single-cell Hi-C
data (in '.txt') for each cell type group in two example folders (`ODCs_example`
and `MGs_axample`). The files follow the same format as those downloaded via
`download_schic()` of `r BiocStyle::Biocpkg("Bandnorm")`. You can extract the
folder path by the code below, which could be used as input for `scHiCcompare()`
function.

```{r}
## Load folder of ODC file path
ODCs_example_path <- system.file("extdata/ODCs_example",
  package = "scHiCcompare"
)

## Load folder of MG file path
MGs_example_path <- system.file("extdata/MGs_example",
  package = "scHiCcompare"
)
```

Since the data downloaded by `r BiocStyle::Biocpkg("Bandnorm")` has the required
input format (5 columns of [chr1, start1, chr2, start2, IF]), we don't need an
extra step for data modification. If, after importing your data into R, its
format does not follow the sparse upper triangular [input](#input) format
requirement, you need to modify the data.


## scHiCcompare function

```{r, eval=FALSE}
scHiCcompare(
  file.path.1, file.path.2,
  select.chromosome = "chr1",
  imputation = "RF",
  normalization = "LOESS",
  differential.detect = "MD.cluster",
  save.output.path = "results/",
  ...
)
```

**Core Parameter** :

The core workflow consists of the following steps:

- **Input Preparation**: Load two sets of scHi-C data, one for each condition.
  - `file.path.1, file.path.2` - Character strings specifying paths to folders
containing scHi-C data for the first and second cell type or condition groups.
  - `select.chromosome` -  Integer or character indicating the chromosome to be
analyzed (e.g., 'chr1' or 'chr10'.) 
- **Imputation (optional)** - `imputation`: Handle missing values in sparse single-cell matrices. `'RF'` enables Random Forest-based imputation.
- **Normalization (optional)** - `normalization`: Apply `'LOESS'` normalization to correct for systematic biases.
- **Differential Testing** - `differential.detect`: Identify significant changes in chromatin interactions. `"MD.cluster"` indicates scHiCcompare's differential detection test on MD plot.


**Optional Parameter** :

In addition to core functionalities, scHiCcompare provides multiple optional parameters to customize imputation, normalization, differential analysis, and output handling.

- Imputation: Users can define pooling strategies (`pool.style`), set the number of imputations (`n.imputation`), and control iteration limits (`maxit`). Additionally, users can include/exclude extreme interaction frequency (IF) values (`outlier.rm`) and set a missing data threshold (`missPerc.threshold`), which determines the maximum allowable percentage of missing data in pool bands outside the focal genomic distances.
- Normalization: The `A.min` parameter helps filter low interaction frequencies, ensuring robust outlier detection during differential analysis.
- Differential Testing: Parameters such as `fprControl.logfc` and `alpha` regulate the false positive rate for GMM difference clusters and define the significance level for outlier detection.
- Output : Options like `save.output.path` allow external storage of results (see 
[output example](#Externally-saved-output-files)), while visualization settings (`Plot`, `Plot.normalize`) generate MD plots for differential detection and normalization effects. 

For further detail, user can refer to `?scHiCcompare()`

### Example of real analysis

In the following example, we will work with scHi-C data from 10 single cells in
both ODC and MG cell types at a 1 MG resolution. We will focus on chromosome 20,
applying the full workflow of scHiCcompare, which includes imputation, 
pseudo-bulk normalization, and differential analysis. Our goal is to detect 
differences for loci with genomic distances ranging from 1 to 10,000,000 bp. 
The progressive pooling style will be selected to create pool bands for the 
random forest imputation. For the differential analysis step, we will set the 
log fold change - false positive control threshold to 0.8.


The input file path was included in the package and conducted in the
[Prepare input folders](#prepare-input-folders) section.


```{r, message=F, warning=FALSE, fig.cap='Chromatin differential detection between ODC and MG in chromosome 20 in example above'}
## Imputation with 'progressive' pooling
result <- scHiCcompare(
  file.path.1 = ODCs_example_path,
  file.path.2 = MGs_example_path,
  select.chromosome = "chr20",
  main.Distances = 1:10000000,
  imputation = "RF",
  normalization = "LOESS",
  differential.detect = "MD.cluster",
  pool.style = "progressive",
  fprControl.logfc = 0.8,
  Plot = TRUE
)
```

From the visualizations above, normalization effectively reduces the irregular
trend in the M values between the imputed pseudo-bulk matrices of the two cell 
types. At a 1MB resolution, the differential analysis reveals that most of the 
detected differences occur at closer genomic distances, particularly below 5MB.


## Output

#### Output objects from the R function

The `scHiCcompare()` function will return an object that contains plots, 
differential results, pseudo-bulk matrices, normalized results, and imputation 
tables. The full differential results are available in `$Differential_Analysis`.
Intermediate results can be accessed with `$Intermediate`, including the 
imputation result table (`$Intermediate$Imputation`), the pseudo-bulk matrix in 
sparse format (`$Intermediate$PseudoBulk`), and the normalization table (`$Intermediate$Bulk.Normalization`). These output table objects have the
following structure:

  - `$Intermediate$PseudoBulk` for each condition group (`$condition1` and
  `$condition2`) has a standard sparse upper triangular format with 3 columns 
  of [region1, region2, IF].
  
  - `$Intermediate$Imputation` for each condition group (`$condition1` and
  `$condition2`) has modified sparse upper triangular format:
  
    + Interacting bins coordination [region1, region2, cell (condition 1 or
    condition2), chr]
    + Imputed interaction frequency of each single-cell [imp.IF_{cell name 1},
    imp.IF_{cell name 2}, imp.IF_{cell name 3}, ...,etc]

  - `$Intermediate$Bulk.Normalization` has 15 columns
    + Interacting bins coordination [chr1, start1, end1, chr2, start2, end2,
    D (scaled genomic distance)]
    + Bulk IF values [bulk.IF1, bulk.IF2, M (their log fold change,
    $log(IF_2/IF_1)$)]
    + Normalized bulk IF values [adj.bulk.IF1, adj.bulk.IF2,
    adj.M (their log fold change, $log(adj.IF_2/adj.IF_1)$)]
    + LOESS correction factor [mc];
    + Average expression value of bulk IF [A].
    
  - `$Differential_Analysis` has same structure as 
  `$Intermediate$Bulk.Normalization` with addition of 2 differential detection
  results columns
    + Z score of interaction frequencies's log fold change [Z]
    + Differential result cluster [Difference.cluster]
    

#### Externally saved output files

You also can have the option to save the results into the chosen directory by
a parameter in `scHiCcompare()` [function](#schiccompare-function). This will
save the normalization result table, differential result table, and imputed cell
scHi-C data (each group is a sub-folder). The sample of the saved output folder
structure is:
 
|-- Bulk_normalization_table.txt

|-- Differential_analysis_table.txt

|-- Imputed_{group 1's name}/

  - | |-- imp_{cell name}.txt

|-- Imputed_{group 2's name}/

  - | |-- imp_{cell name}.txt

The normalization result `Bulk_normalization_table.txt` has the same format as
the output object from the `scHiCcompare()` function, 
`$Intermediate$Bulk.Normalization`, which is shown in the structure 
example below. 
  
The differential result table `Differential_analysis_table.txt` also has the 
same format as the output object `$Differential_Analysis` from the function.

The imputed cell's scHiC data is saved in a folder for each group, which has 
a modified sparse upper triangular format of five columns 
[chr1, start1, chr2, start2, IF].


### Example of output

Below is a continuous example from 
[Example of real anlysis](#example-of-real-anlysis) above, showing how you can
extract different result options from the `scHiCcompare()` function.

```{r}
### Summary the analysis
print(result)
```

```{r}
### Extract imputed differential result
diff_result <- result$Differential_Analysis

DT::datatable(head(diff_result), options = list(scrollX = TRUE), width = 700)
```

```{r}
### Extract imputed pseudo bulk matrices normalization
norm_result <- result$Intermediate$Bulk.Normalization

DT::datatable(head(norm_result), options = list(scrollX = TRUE), width = 700)
```

```{r}
### Extract imputed ODC cell type table
imp_ODC_table <- result$Intermediate$Imputation$condition1

DT::datatable(head(imp_ODC_table), options = list(scrollX = TRUE), width = 700)
```

```{r}
## Extract Pseudo-bulk matrix from imputed scHi-C data
## Pseudo bulk matrix in standard sparse format
psudobulk_result <- result$Intermediate$PseudoBulk$condition1

DT::datatable(head(psudobulk_result),
  options = list(scrollX = TRUE), width = 700
)
```

Furthermore, you also have some parameter options in the function to indicate
which plots to output and an option to save the results in a given directory.


# Helper functions

There are several other functions included in `scHiCcompare` package. 

## Heatmap HiC matrix plot
`plot_HiCmatrix_heatmap()` produces a heatmap visualization for HiC and scHiC
matrices. It requires, as input, a modified sparse matrix, the same format from
`scHiCcompare()` [Input](#input) with five columns of chr1, start1, chr2 start2,
IF. More information can be found in its help document and the example below.

```{r}
data("ODC.bandnorm_chr20_1")

plot_HiCmatrix_heatmap(
  scHiC.sparse = ODC.bandnorm_chr20_1,
  main = "Figure 3. Heatmap of a single cell matrix", zlim = c(0, 5)
)
```


## Imputation Diagnostic plot
`plot_imputed_distance_diagnostic()` generates a diagnostic visualization of
imputation across genomic distances for all single cells. It compares the 
distribution of all cells' interaction frequency at a given distance data before
and after imputation. It requires, as input, the scHiC table format of the 
original and imputed scHiC datasets. ScHiC table format includes columns of 
genomic loci coordinates and interaction frequencies (IF) of each cell (cell,
chromosome, start1, end1, IF1, IF2, IF3, etc). 

The output of `$Intermediate$Imputation` of `scHiCcompare()` function is 
directly compatible with this format. For more details, see the sections 
on [Output](#output))


```{r, message=FALSE, warning=FALSE}
# Extract imputed table result
imp_MG_table <- result$Intermediate$Imputation$condition2
imp_ODC_table <- result$Intermediate$Imputation$condition1
```

```{r, echo=FALSE}
DT::datatable(imp_ODC_table, options = list(scrollX = TRUE), width = 700)
```

We need to create the table input for original IFs values in the same format.
Below is a continuous example from 
[Example of real anlysis](#example-of-real-analysis) above, showing how you can
construct scHiC table for original IF values and compare them with the output 
of imputed IF values.

```{r, message=FALSE, warning=FALSE}
# Create scHiC table object for original ODC interaction frequencies (IF)
scHiC.table_ODC <- imp_ODC_table[c("region1", "region2", "cell", "chr")]

# List all files in the specified directory for original ODC data
file.names <- list.files(
  path = ODCs_example_path,
  full.names = TRUE, recursive = TRUE
)

# Loop through each file to read and merge data
for (i in 1:length(file.names)) {
  # Read the current file into a data frame
  data <- read.delim(file.names[[i]])

  names(data) <- c("chr", "region1", "chr2", "region2", paste0("IF_", i))

  data <- data[, names(data) %in%
    c("chr", "region1", "region2", paste0("IF_", i))]

  # Merge the newly read data with the existing scHiC.table_ODC
  scHiC.table_ODC <- merge(scHiC.table_ODC, data,
    by = c("region1", "region2", "chr"), all = TRUE
  )
}

# Create scHiC table object for original MG interaction frequencies (IF)
scHiC.table_MG <- imp_MG_table[c("region1", "region2", "cell", "chr")]

# List all files in the specified directory for original MG data
file.names <- list.files(
  path = MGs_example_path,
  full.names = TRUE, recursive = TRUE
)

# Loop through each file to read and merge data
for (i in 1:length(file.names)) {
  # Read the current file into a data frame
  data <- read.delim(file.names[[i]])

  names(data) <- c("chr", "region1", "chr2", "region2", paste0("IF_", i))

  data <- data[, names(data) %in%
    c("chr", "region1", "region2", paste0("IF_", i))]
  # Merge the newly read data with the existing scHiC.table_MG

  scHiC.table_MG <- merge(scHiC.table_MG, data,
    by = c("region1", "region2", "chr"), all = TRUE
  )
}
```

```{r, message=FALSE, warning=FALSE, , fig.cap='Distribution diagnostic plot of imputed MG cells in some genomic distance'}
# plot imputed Distance Diagnostic of MG
plot1 <- plot_imputed_distance_diagnostic(
  raw_sc_data = scHiC.table_MG,
  imp_sc_data = imp_MG_table, D = 1
)

plot2 <- plot_imputed_distance_diagnostic(
  raw_sc_data = scHiC.table_MG,
  imp_sc_data = imp_MG_table, D = 2
)

plot3 <- plot_imputed_distance_diagnostic(
  raw_sc_data = scHiC.table_MG,
  imp_sc_data = imp_MG_table, D = 3
)

plot4 <- plot_imputed_distance_diagnostic(
  raw_sc_data = scHiC.table_MG,
  imp_sc_data = imp_MG_table, D = 4
)

gridExtra::grid.arrange(plot1, plot2, plot3, plot4, ncol = 2, nrow = 2)
```

The diagnostic visualizations demonstrate that with a sample of only 10 single
cells per group (note: this small sample size is for demonstration purposes 
only), the imputed values for MG closely match the original distribution only 
at shorter genomic distances (e.g., D1, D2). Increasing the number of single 
cells per group enhances imputation accuracy across distances. We recommend 
using a minimum of 80 single cells per group for optimal imputation performance.


# Others

### Download scHiC data

 To find and download single-cell Hi-C (scHi-C) data, you can use publicly 
 available repositories and databases that host this type of data. Common 
 sources include the Gene Expression Omnibus (GEO), the 4D Nucleome Data Portal 
 (4DN), or data published in research articles, etc. Below are some examples of 
 how to access and download scHi-C data.
  
  - **[GEO](#https://www.ncbi.nlm.nih.gov/geo/) (Gene Expression Omnibus)**:
  
  You can search data on [GEO](#https://www.ncbi.nlm.nih.gov/geo/) using queries
  such as `single-cell Hi-C`, `scHi-C`, or a GEO (GSE) series numbers (e.g., [GSE80006](#https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80006)), etc. 
  Once you select a dataset, processed data  (.txt, .csv, .bed, .hic, etc.)  
  can be found under the "Supplementary files" section and downloaded via the 
  `FTP` links at the bottom of the page. Additionally, [GEO](#https://www.ncbi.nlm.nih.gov/geo/) offers various download formats using 
  different mechanisms. For more details about downloading data in different 
  formats, visit the GEO download guide:
  <https://www.ncbi.nlm.nih.gov/geo/info/download.html>.
  
 You can also download these data by R. The example below shows steps to 
 download the mouse scHi-C dataset (Flyamer et al.2017) on [GEO](#https://www.ncbi.nlm.nih.gov/geo/).
  
```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("GEOquery")
```
  
```{r, eval=FALSE}
library(GEOquery)
# For example, we want to download (Flyamer et al.2017) data
geo_id <- "GSE80006"
# Download the information, notation, feature data, etc
gse <- getGEO(geo_id, GSEMatrix = TRUE)
# You can read more about this function
?getGEO()

# Download and extract the supplementary files (processed data)
getGEOSuppFiles(geo_id, baseDir = "path/to/save")
```
    
**Other sources**:
  
 Some research papers provide data from external sources, which are usually 
 mentioned in the paper. For example, human brain datasets (Lee et al., 2019) 
 are available through a public box directory located <https://salkinstitute.box.com/s/fp63a4j36m5k255dhje3zcj5kfuzkyj1>.
  
 Additionally, some tools collect scHi-C data from various studies, like  <https://noble.gs.washington.edu/proj/schic-topic-model/>. Below is an example 
 of an R function from `r BiocStyle::Biocpkg("Bandnorm")`, which also accesses 
 existing single-cell Hi-C data at a 1mbp resolution 

 To download human brain oligodendrocytes (ODC) and microglia (MG) cell type 
 (Lee et al., 2019), we used the `download_schic()` function of `r BiocStyle::Biocpkg("Bandnorm")` package to download the scHiC data of  ODC and
 MG cell types groups in 1MB resolution.

```{r, eval=FALSE}
install.packages(c("ggplot2", "dplyr", "data.table", "Rtsne", "umap"))

if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("immunogenomics/harmony")

devtools::install_github("sshen82/BandNorm", build_vignettes = FALSE)

library(BandNorm)
```

```{r, eval=FALSE}
### Download scHiC data of ODC and MG
download_schic("Lee2019", cell_type = "ODC", cell_path = "ODCs_example")
download_schic("Lee2019", cell_type = "MG", cell_path = "MGs_example")
```


### Import scHiC data in R

After downloading scHi-C data, the next step is to import the data into R for 
analysis.

ScHi-C data is available in various formats from different sources. Below are
examples of how to extract chromosome-specific data for analysis in R.

  - **`.bedepe, .csv, or .txt` formats** : 
  
If your raw scHiC data has been processed to '.bedepe', '.csv', or '.txt' 
formats, it can be read using `read.delim()`, `read.table()`, etc. Once the data
is loaded, if you have full Hi-C contact matrices, you can convert them to a 
sparse upper triangular format using the `full2sparse()` function of the `r BiocStyle::Biocpkg("HiCcompare")` package, then reformatting the columns to 
achieve the sparse upper triangular [input](#input) format.

  - **`.hic` format**:  
  
To access and import `.hic` files into R, you can use tools such as `r BiocStyle::Biocpkg("strawr")` for reading and processing `.hic` files. You can 
read other similar example in the `r BiocStyle::Biocpkg("HiCcompare")` package [vignette](https://www.bioconductor.org/packages/release/bioc/vignettes/HiCcompare/inst/doc/HiCcompare-vignette.html)

An example of reading `.hic` with `r BiocStyle::Biocpkg("strawr")` is shown 
below. `straw()` reads the `.hic` file of each single-cell Hi-C and outputs a 
data.frame in a sparse upper triangular format. This step must be repeated for 
single-cell Hi-C of the same cell type (condition) group.
    
```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("strawr")
```

```{r, eval=FALSE}
library(strawr)

# Example to read the contact matrix from a .hic file of a single-cell
filepath <- "path/to/your/schic.hic"

contact_matrix <- straw(
  norm = "NONE", # Normalization method (KR, VC, or NONE)
  filepath, # Path to .hic file
  chr1loc = "chr1", # Chromosome 1
  chr2loc = "chr1", # Chromosome 2 (intra-chromosomal interactions)
  unit = "BP", # Base pair (BP) resolution or fragment (FRAG) resolution
  binsize = 200000 # Bin size (e.g., 200kb)
)
```

  - **`.cool` format**:  
  
To access and import `.cool` files into R,  you can use `cooler2bedpe()` 
function of `r BiocStyle::Biocpkg("HiCcompare")` package or [cooler](#<http://cooler.readthedocs.io/en/latest/index.html>) to access the
data. You can read example of
[cooler](#<http://cooler.readthedocs.io/en/latest/index.html>) on  `r BiocStyle::Biocpkg("HiCcompare")` [vignette](https://www.bioconductor.org/packages/release/bioc/vignettes/HiCcompare/inst/doc/HiCcompare-vignette.html)

For example, the files can be read directly into R by `cooler2bedpe()` function,
which will return a list object in the format of BEDPE, containing two elements:
"cis" - Contains the intra-chromosomal contact matrices, one per chromosome;
"trans" - Contains the inter-chromosomal contact matrix. You can read about this
in more detail by `?cooler2bedpe()`.
  
```{r, eval=FALSE}
# Install BiocManager if not already installed
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

# Install HiCcompare package from Bioconductor
BiocManager::install("HiCcompare")

# Load the HiCcompare package
library(HiCcompare)
```

```{r, eval=FALSE}
cool.file <- read_files("path/to/schic.cool")
```


# Session Info

```{r, echo=FALSE}
sessionInfo()
```