---
title: "Visualization"
author: 
  - name: Kevin Stachelek
    affiliation:
    - University of Southern California   
    email: kevin.stachelek@gmail.com
  - name: Bhavana Bhat
    affiliation:
    - University of Southern California   
    email: bbhat@usc.edu
output: 
  BiocStyle::html_document:
    self_contained: yes
    toc: true
    toc_float: true
    toc_depth: 2
    code_folding: show
date: "`r doc_date()`"
package: "`r pkg_ver('chevreulPlot')`"
vignette: >
  %\VignetteIndexEntry{Visualization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL, ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
    dpi = 100,
    out.width = "100%",
    message = FALSE,
    warning = FALSE,
    fig.width = 5,
    fig.height = 3
)

```

This article demonstrates the data visualization tools in Chevreul. We'll 
introduce included functions, their usage, and resulting plots 

First step is to load chevreulPlot package and all other packages required 

```{r packages}
library(chevreulPlot)
library(scater)
library(scran)
library(clustree)
library(patchwork)

data("small_example_dataset")
```


The different plotting functions within chevreulPlot allows for visualization of 
data, these plots can be customized for interactive or non-interactive display.

# Plot expression 

Expression of a feature (genes or transcripts) can be plotted on a given 
embedding resulting in an interactive feature plot. 

When plotting only one feature, output is identical to 
`SingleCellExperiment::FeaturePlot` 
 
```{r}
plot_feature_on_embedding(small_example_dataset,
    embedding = "UMAP",
    features = "Gene_0001", return_plotly = FALSE
)
```


An interactive output plot can be generated by specifying `return_plotly = TRUE`
which uses `ggplotly` allowing identification of individual cells for further 
investigation.

## Plot read count or other QC measurements

The `plot_colData_histogram` function displays a histogram of cell read counts colored 
according to a categorical variable using the argument `fill_by`. Here we can 
see that read counts for this dataset are distinctly different depending on
the sequencing batch

```{r}
plot_colData_histogram(small_example_dataset,
    group_by = "sizeFactor",
    fill_by = "Treatment"
)
```

## Plot metadata variable

Make an interactive scatter plot of a metadata variable, where each point in
the plot represents a cell whose position on the plot is given by the cell 
embedding determined by the dimensional reduction technique by default, "UMAP".
The group argument specifies the colData variable by which to group the cells 
by, by default, "batch".

```{r}
plot_colData_on_embedding(small_example_dataset,
    group = "gene_snn_res.1",
    embedding = "UMAP"
)
```

This function utilizes a SingleCellExperiment function, `DimPlot()`, as sub 
function which produces the dimensional reduction plot. The interactive 
parameter, `return_plotly`, in plot_colData_on_embedding when set to TRUE will convert the 
plot into an interactive plot using ggplotly function from R's plotly package

## Plot cluster marker genes  

Marker genes of louvain clusters or additional experimental metadata can be 
plotted using `plot_marker_features`. This allows visualization of n marker features 
grouped by the metadata of interest. Marker genes are identified using wilcoxon 
rank-sum test as implemented in `presto`. In the resulting dot plot the size 
of the dot corresponds to the percentage of cells expressing the feature in 
each cluster and the color represents the average expression level of the 
feature. 


```{r}
plot_marker_features(small_example_dataset,
    group_by = "gene_snn_res.1",
    marker_method = "wilcox"
)
```

## Plotting transcript composition

`plot_transcript_composition()` plots the proportion of reads of a given gene
map to each transcript. The gene of interest is specified by the argument 
'gene_symbol'. 
 
```{r, results=FALSE, echo=FALSE, eval = FALSE}
plot_transcript_composition(small_example_dataset, "NRL",
    group.by = "gene_snn_res.1", standardize = TRUE
)
```

```{r}
sessionInfo()
```