--- title: "Getting Started With SEraster" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started With SEraster} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=6, fig.height=4, dpi = 72, dev = "png" ) ``` # Introduction `SEraster` is a rasterization preprocessing framework that aggregates cellular information into spatial pixels to reduce resource requirements for spatial omics data analysis. `SEraster` reduces the number of spatial points in spatial omics datasets for downstream analysis through a process of rasterization where single cells' gene expression or cell-type labels are aggregated into equally sized pixels based on a user-defined `resolution`. Here, we refer to a particular `resolution` of rasterization by the side length of the pixel such that finer `resolution` indicates smaller pixel size and coarser `resolution` indicates larger pixel size. More details describing `SEraster` are available in [our paper](https://doi.org/10.1093/bioinformatics/btae412). This tutorial walks you through the basic functionalities of `SEraster` and two examples of downstream analysis that can be performed with the rasterized spatial omics data. Additional vignettes are available on [our documentation website](jef.works/SEraster/). For downstream analyses, we will be using [`nnSVG`](https://bioconductor.org/packages/nnSVG) for spatially variable gene (SVG) analysis and [`CooccurrenceAffinity`](https://cran.r-project.org/package=CooccurrenceAffinity) for cell-type co-enrichment analysis. References for nnSVG and CooccurrenceAffinity can be found below: - [Weber, L. et al. (2023), "nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes", *Nature Communications*](https://doi.org/10.1038/s41467-023-39748-z) - [Mainali, K. et al. (2021), "A better index for analysis of co-occurrence and similarity", *Science Advances*](https://doi.org/10.1126/sciadv.abj9204) - [Mainali,K. et al. (2022), "CooccurrenceAffinity: An R package for computing a novel metric of affinity in co-occurrence data that corrects for pervasive errors in traditional indices", *bioRxiv*](https://doi.org/10.1101/2022.11.01.514801) # Installation To install this package using Bioconductor, start R (version "4.4.0") and enter: ```{r, eval=FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SEraster") ``` The latest development version can also be installed from [GitHub](https://github.com/JEFworks-Lab/SEraster) using `remotes`: ```{r, eval=FALSE} require(remotes) remotes::install_github('JEFworks-Lab/SEraster') ``` In addition, `SEraster` is also compatible with `SeuratObject` through `SeuratWrappers`. `SeuratWrappers` implementation can be installed using `remotes`: ```{r, eval=FALSE} require(remotes) remotes::install_github('satijalab/seurat-wrappers@SEraster') ``` Documentation and tutorial for the `SeuratWrappers` implementation can be found in the `SEraster` branch of the [`SeuratWrappers` GitHub repository](https://github.com/satijalab/seurat-wrappers/tree/SEraster). # Dataset In the examples below, we will use the MERFISH mouse preoptic area (mPOA) dataset available in the `SEraster` package. The dataset is taken from [Moffitt, J. et al. (2018), "Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region", *Science*](https://doi.org/10.1126/science.aau5324) and formatted as a `SpatialExperiment` object as shown [here](https://jef.works/SEraster/reference/merfish_mousePOA.html). Please refer to the following documentations to see how you can format your data into a `SpatialExperiment` object: - [SpatialExperiment](https://bioconductor.org/packages/SpatialExperiment) package - [Formatting a SpatialExperiment Object for SEraster](https://jef.works/SEraster/articles/formatting-SpatialExperiment-for-SEraster.html) # Tutorial ## Load libraries ```{r, message=FALSE} library(SEraster) library(SpatialExperiment) library(nnSVG) library(CooccurrenceAffinity) library(ggplot2) ``` ## Load example dataset ```{r} data("merfish_mousePOA") # check the dimension of the genes-by-cells matrix at single-cell resolution dim(merfish_mousePOA) # check the number of cell-types length(unique(colData(merfish_mousePOA)$celltype)) ``` This MERFISH mPOA dataset contains 6,509 cells and 16 cell-types. ```{r} # plot at single-cell resolution df <- data.frame(spatialCoords(merfish_mousePOA), celltype = colData(merfish_mousePOA)$celltype) ggplot(df, aes(x = x, y = y, col = celltype)) + coord_fixed() + geom_point(size = 1.5, stroke = 0) + guides(col = guide_legend(override.aes = list(size = 3))) + labs(x = "x (μm)", y = "y (μm)", col = "Cell-types") + theme_bw() + theme(panel.grid = element_blank()) ``` ## SEraster basic functionalities `SEraster` reduces the number of spatial points in spatial omics datasets for downstream analysis through a process of rasterization where single cells’ gene expression or cell-type labels are aggregated into equally sized square or hexagonal pixels (can be changed using the `square` argument) based on a user-defined resolution. Here, we demonstrate the basic functionalities of `SEraster`. ### Rasterize gene expression For continuous variables such as gene expression or other molecular information (e.g. protein expression if you are using spatial proteomics datasets), `SEraster` aggregates the observed raw counts or normalized expression values for each molecule within each pixel using means by default (can be changed using the `fun` argument). Let's try rasterizing the gene expression of the MERFISH mouse POA dataset we loaded. ```{r} rastGexp <- SEraster::rasterizeGeneExpression(merfish_mousePOA, assay_name="volnorm", resolution = 50) # check the dimension of the genes-by-cells matrix after rasterizing gene expression dim(rastGexp) ``` As you can see, `SEraster` aggregated 6,509 single cells into 1,301 pixels. ```{r} # plot total rasterized gene expression SEraster::plotRaster(rastGexp, name = "Total rasterized gene expression") ``` ```{r} # plot a specific gene SEraster::plotRaster(rastGexp, feature_name = "Esr1", name = "Esr1") ``` ### Rasterize gene expression within cell-type Such rasterization can also be performed in a cell-type-specific manner by restricting to cells of a particular cell-type prior to rasterization. Here, we subset the dataset to Inhibitory cell-type and run `SEraster` on the subsetted dataset. ```{r} # rasterize cell-type specific gene expression by subsetting to cell-type of interest ct_interest <- "Inhibitory" spe_subset <- merfish_mousePOA[,merfish_mousePOA$celltype == ct_interest] # rasterize gene expression rastGexpSubset <- SEraster::rasterizeGeneExpression(spe_subset, assay_name="volnorm", resolution = 50) ``` ```{r} # plot SEraster::plotRaster(rastGexpSubset, name = paste0("Total rast gexp in ", ct_interest)) ``` ```{r} SEraster::plotRaster(rastGexpSubset, feature_name = "Esr1", name = paste0("Esr1 in ", ct_interest)) ``` ### Rasterize cell-type For categorical variables such as cell-type or cluster labels, `SEraster` aggregates the number of cells for each label within each pixel using sums by default (can be changed using the `fun` argument). Let's try rasterizing the cell-type labels of the MERFISH mouse POA dataset. ```{r} rastCt <- SEraster::rasterizeCellType(merfish_mousePOA, col_name = "celltype", resolution = 50) # check the dimension of the cell-types-by-cells matrix after rasterizing cell-type labels dim(rastGexp) ``` ```{r} # plot total cell counts SEraster::plotRaster(rastCt, name = "cell counts", option = "inferno") ``` ```{r} # plot specific cell-type SEraster::plotRaster(rastCt, feature_name = "Inhibitory", name = "Inhibitory neuron counts", option = "inferno") ``` ### Setting rasterization resolution Rasterization resolution can be controlled by the `resolution` argument of the `rasterizeGeneExpression` and `rasterizeCellType` functions. Here, we refer to a particular resolution of rasterization by the side length for square pixels and the distance between opposite edges for hexagonal pixels such that finer resolution indicates smaller pixel size and vice versa. Let's see how the rasterized MERFISH mouse POA dataset look with various resolutions using square pixels. ```{r} resolutions <- c(50, 100, 200) for (resolution in resolutions) { # rasterize at defined resolution temp <- SEraster::rasterizeGeneExpression(merfish_mousePOA, assay_name="volnorm", resolution = resolution) # plot a specific gene plt <- SEraster::plotRaster(temp, feature_name = "Esr1", name = "Esr1", plotTitle = paste0("resolution: ", resolution)) show(plt) } ``` Now, let's see the same resolutions using hexagonal pixels. ```{r} for (resolution in resolutions) { # rasterize at defined resolution temp <- SEraster::rasterizeGeneExpression(merfish_mousePOA, assay_name="volnorm", resolution = resolution, square = FALSE) # plot a specific gene plt <- SEraster::plotRaster(temp, feature_name = "Esr1", name = "Esr1", plotTitle = paste0("resolution: ", resolution)) show(plt) } ``` ### Creating and rasterizing permutations Since rasterized values may be sensitive to edge effects such as the specific boundaries of grids upon rasterization, `SEraster` enables permutation by rotating the dataset at various angles before rasterization. For example, let's create 3 permutations of the MERFISH mouse POA dataset, which would output a `list` of 3 `SpatialExperiment` objects with x,y coordinates rotated at 0, 120, and 240 degrees around the midrange point. In addition to a single `SpatialExperiment` object, `rasterizeGeneExpression` and `rasterizeCellType` functions can both take a `list` of `SpatialExperiment` objects. This essentially allows users to streamline the preprocessing of permutations with `SEraster`; followed by a downstream analysis of choice. For instance, in our manuscript, we have shown that permutations can be used to improve the performance of SVG analysis. ```{r} # permutate spe_list <- permutateByRotation(merfish_mousePOA, n_perm = 3) # rasterize permutated datasets at once out_list <- rasterizeGeneExpression(spe_list, assay_name = "volnorm", resolution = 50) for (i in seq_along(out_list)) { # extract rotated angle angle <- gsub("rotated_", "", paste0("rotated ", names(out_list)[[i]], " degrees")) # plot a specific gene plt <- SEraster::plotRaster(out_list[[i]], feature_name = "Esr1", name = "Esr1", plotTitle = angle) show(plt) } ``` As you can see from the plots above, when `SEraster` rasterizes a `list` of `SpatialExperiment` objects, all `SpatialExperiment` objects in the inputted `list` are rasterized with the same pixel coordinate framework (same bounding box, resolution, centroid coordinates). This feature may not be particularly useful for permutations; however, it can potentially be applied to compare two or more datasets, such as structurally aligned tissues as well as healthy vs. disease tissues. ## Examples of downstream analyses after SEraster preprocessing ### Spatial variable gene (SVG) analysis Here, we use a previously developed tool called `nnSVG`. Please refer to [nnSVG](https://bioconductor.org/packages/nnSVG) for more details about the package. We can directly input rasterized gene expression `SpatialExperiment` object from `SEraster` into `nnSVG`. ```{r} # run nnSVG set.seed(0) rastGexp <- nnSVG(rastGexp, assay_name = "pixelval") ``` ```{r} # number of significant SVGs based on the selected adjusted p-value threshold table(rowData(rastGexp)$padj <= 0.05) ``` ```{r} # plot rasterized gene expression of top-ranked SVG top_svg <- which(rowData(rastGexp)$rank == 1) top_svg_name <- rownames(rowData(rastGexp))[top_svg] SEraster::plotRaster(rastGexp, feature_name = top_svg_name, name = top_svg_name) ``` We can also perform cell-type specific SVG analysis by subsetting the dataset prior to applying `SEraster`. ```{r} # subset data ct_interest <- "Excitatory" spe_sub <- merfish_mousePOA[,merfish_mousePOA$celltype == ct_interest] # run SEraster rastGexp_sub <- SEraster::rasterizeGeneExpression(spe_sub, assay_name="volnorm", resolution = 50) # run nnSVG set.seed(0) rastGexp_sub <- nnSVG(rastGexp_sub, assay_name = "pixelval") ``` ```{r} # number of significant SVGs table(rowData(rastGexp_sub)$padj <= 0.05) ``` ```{r} # plot rasterized gene expression of top-ranked SVG top_svg <- which(rowData(rastGexp_sub)$rank == 1) top_svg_name <- rownames(rowData(rastGexp_sub))[top_svg] SEraster::plotRaster(rastGexp_sub, feature_name = top_svg_name, name = top_svg_name) ``` ### Cell-type co-enrichment analysis Rasterized cell-type labels can be used to analyze pair-wise cell-type co-enrichment To do so, we binarize the rasterized cell-type labels using a relative enrichment metric and a previously developed tool called `CooccurrenceAffinity`. Please refer to our paper for more details about the methodology and [CooccurrenceAffinity](https://CRAN.R-project.org/package=CooccurrenceAffinity) for more details about the package. ```{r} # extract cell-type labels ct_labels <- as.factor(colData(merfish_mousePOA)$celltype) # compute relative enrichment (RE) metric mat <- assay(rastCt, "pixelval") mat_re <- do.call(rbind, lapply(rownames(rastCt), function(ct_label) { mat[ct_label,] / (sum(mat[ct_label,]) / sum(mat) * colSums(mat)) })) rownames(mat_re) <- rownames(mat) # binarize mat_bin <- ifelse(mat_re >= 1, 1, 0) # add RE and binarized layers to SpatialExperiment object assays(rastCt) <- list(pixelval = assay(rastCt, "pixelval"), re = mat_re, bin = mat_bin) ``` ```{r} ct_interest <- "Ependymal" # plot pixel value for a cell-type of interest plotRaster(rastCt, assay_name = "pixelval", feature_name = ct_interest, name = "cell-type counts", option = "inferno") ``` ```{r} # plot RE value for a cell-type of interest plotRaster(rastCt, assay_name = "re", feature_name = ct_interest, name = "RE", option = "inferno") ``` ```{r} # plot binarized value for a cell-type of interest plotRaster(rastCt, assay_name = "bin", feature_name = ct_interest, factor_levels = c(0,1), name = "binarized", option = "inferno") ``` ```{r} # run CooccurrenceAffinity ct_coocc <- CooccurrenceAffinity::affinity(data = mat_bin, row.or.col = "row", squarematrix = c("all")) # plot maximum likelihood estimates of affinity metric (alpha MLE) CooccurrenceAffinity::plotgg(data = ct_coocc, variable = "alpha_mle", legendlimit = "datarange") ``` # Session Information ```{r} sessionInfo() ```