---
title: "LinTInd - tutorial"
author: "Luyue Wang"
date: "2021/12/22"
output: html_document
vignette: >
  %\VignetteIndexEntry{LinTInd - tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Introduction
Single-cell RNA sequencing has become a common approach to trace developmental processes of cells, however, using exogenous barcodes is more direct than predicting from expression profiles recently, based on that, as gene-editing technology matures, combining this technological method with exogenous barcodes can generate more complex dynamic information for single-cell. In this application note, we introduce an R package: LinTInd for reconstructing a tree from alleles generated by the genome-editing tool known as CRISPR for a moderate time period based on the order in which editing occurs, and for sc-RNA seq, ScarLin can also quantify the similarity between each cluster in three ways.

## Installation

Via GitHub
```
devtools::install_github("mana-W/LinTInd")
```

Via Bioconductor
```
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("LinTInd")
```

```{r load package, message=FALSE}
library(LinTInd)
```

## Import data 
The input for LinTInd consists three required files:

- sequence
- reference  
- position of cutsites

and an optional file:

- celltype


```{R,message=FALSE}
data<-paste0(system.file("extdata",package = 'LinTInd'),"/CB_UMI")
fafile<-paste0(system.file("extdata",package = 'LinTInd'),"/V3.fasta")
cutsite<-paste0(system.file("extdata",package = 'LinTInd'),"/V3.cutSites")
celltype<-paste0(system.file("extdata",package = 'LinTInd'),"/celltype.tsv")
data<-read.table(data,sep="\t",header=TRUE)
ref<-ReadFasta(fafile)
cutsite<-read.table(cutsite,col.names = c("indx","start","end"))
celltype<-read.table(celltype,header=TRUE,stringsAsFactors=FALSE)
```

For the sequence file, only the column contain reads' strings is requeired, the cell barcodes and UMIs are both optional.

```{R}
head(data,3)
ref
cutsite
head(celltype,3)
```

## Array identify and indel visualization

In the first step, we shold use `FindIndel()` to alignment and find indels, and the function `IndelForm()` will help us to generate an array-form string for each read.

```{R find indels and generate array-form strings, message=FALSE}
scarinfo<-FindIndel(data=data,scarfull=ref,scar=cutsite,indel.coverage="All",type="test",cln=1)
scarinfo<-IndelForm(scarinfo,cln=1)
```

Then for single-cell sequencing, we shold define a final-version of array-form string for each cell use `IndelIdents()`, there are three method are provided :

- *"reads.num"*(default): find an array-form stirng supported by most reads in a cell 
- *"umi.num"*: find an array-form stirng supported by most UMIs in a cell
- *"consensus"*: find the consistent sequences in each cell, and then generate array-form strings from the new reads

For bulk sequencing, in this step, we will generate a "cell barcode" for each read.

```{r IndelIdents, message=FALSE}
cellsinfo<-IndelIdents(scarinfo,method.use="umi.num",cln=1)
```

After define the indels for each cell, we can use `IndelPlot()` to visualise them.

```{r IndelPlot}
IndelPlot(cellsinfo = cellsinfo)
```

## Indel extract and similarity calculate

We can use the function `TagProcess()` to extract indels for cells/reads. The parameter *Cells* is optional.

```{r TagProcess}
tag<-TagProcess(cellsinfo$info,Cells=celltype)
```

And if the annotation of each cells are provided, we can also use `TagDist()` to calculate the relationship between each group in three way:

- *"Jaccard"*(default): calculate the weighted jaccard similarity of indels between each pair of groups
- *"P"*: right-tailed test, compare the Indels intersection level with the hypothetical result generated from random editing, and the former is expected to be significantly higher than the latter
- *"spearman"*: Spearman correlation of indels between each pair of groups

The heatmap of this result will be saved as a pdf file.

```{r TagDist}
tag_dist=TagDist(tag,method = "Jaccard")
tag_dist
```

## Tree reconstruct

In the laste part, we can use `BuildTree()` to Generate an array generant tree.
```{r BuildTree}
treeinfo<-BuildTree(tag)
```

Finally, we can use the function `PlotTree()` to visualise the tree created before.
```{r PlotTree}
plotinfo<-PlotTree(treeinfo = treeinfo,data.extract = "TRUE",annotation = "TRUE")
plotinfo$p
```

## Session Info

```{r sessionInfo, echo=TRUE}
sessionInfo()
```