--- title: | | Multi-sample multi-group | spatial transcriptomics data author: "Peiying Cai" date: "`r doc_date()`" vignette: > %\VignetteIndexEntry{Multi-sample multi-group spatial transcriptomics data} %\VignetteEngine{knitr::rmarkdown} output: BiocStyle::html_document bibliography: refs.bib --- # Overview The `muSpaData` package includes datasets for use in the `DESpace` package's examples and vignettes. It provides access to a publicly available Stereo-seq spatial dataset with complex experimental designs. This dataset, containing multiple samples (e.g., serial sections) measured under various experimental conditions (e.g., time points), is formatted as `SpatialExperiment` (SPE) Bioconductor objects. # Available datasets The table below provides details about the available dataset, including its unique identifier (ID), description, source, and reference. View details directly in R using `?ID` (e.g., `?Wei22_full`). ID | Description | Availability | Reference ---|-------------|--------------|---------- `Wei22_full` | Single-cell Stereo-seq spatial transcriptomics data includes axolotl brain tissues collected from multiple sections across various regeneration stages (16 samples in total) | Spatial Transcript Omics DataBase (STOmics DB) [STDS0000056](https://db.cngb.org/stomics/datasets/STDS0000056/data) |@ARTISTA @Banksy `Wei22_example` | A subset of the Wei22_full dataset, focusing on fewer genes and regeneration stages (6 samples in total) | Spatial Transcript Omics DataBase (STOmics DB) [STDS0000056](https://db.cngb.org/stomics/datasets/STDS0000056/data) | @ARTISTA @Banksy After downloading the raw data from the original source, we merge samples across different time phases, perform quality control to filter low-quality genes and cells, and apply Banksy[@Banksy] for multi-sample clustering and smoothing. The finalized SPE objects are made available via Bioconductor's ExperimentHub for easy access and reproducibility. # Installation `muSpaData` is an R package available via [Bioconductor](http://bioconductor.org/) repository for packages. GitHub repository can be found [here](https://github.com/peicai/muSpaData). ```{r install, eval=FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("muSpaData") ## Check that you have a valid Bioconductor installation BiocManager::valid() ``` Then load packages: ```{r, message = FALSE} suppressMessages({ library(muSpaData) library(ExperimentHub) library(ggplot2) }) ``` # Data loading All datasets in `muSpaData` can be loaded either through named functions corresponding to the object names or via the `ExperimentHub` interface. Each SPE contains filtered counts in the `assay` slot, with Banksy clusters stored in the `Banksy` and `Banksy_smoothed` columns within the `colData` slot. ## Via functions ```{r, message = FALSE} # Load the small example spe data (spe <- Wei22_example()) ``` ```{r, eval=FALSE} # If you want to download the full data (about 5.2 GB in RAM) use: if (benchmarkme::get_ram() > 5e9) { Wei22_full() } ``` ## Using `query` via `ExperimentHub` First, initialize a Hub instance with `ExperimentHub` to load all records into the variable `eh`. Use `query` to identify `muSpaData` records and their accession IDs (e.g., EH123), then load the data into R with `eh[[id]]`. ```{r message = FALSE} # Connect to ExperimentHub and create Hub instance eh <- ExperimentHub() (q <- query(eh, "muSpaData")) # load the first resource in the list q[[1]] # load by accession id eh[["EH9613"]] ``` ## Using `list/loadResources` To facilitate data discovery within `muSpaData` rather than across all of ExperimentHub, available records can be viewed using `listResources`. To load a specific dataset or subset, use `loadResources`. ```{r message = FALSE} listResources(eh, "muSpaData") # load data using a character vector of metadata search terms loadResources(eh, "muSpaData", c("example")) ``` # Explore the data Since manual annotations are unavailable in the original dataset, we used Banksy [@Banksy] to define spatial domains by jointly modeling multiple samples. The Banksy spatial cluster assignments are available in the `colData()`. ```{r view ARTISTA Banksy, fig.width=5,fig.height=4} # View LIBD layers for one sample CD <- colData(spe) |> as.data.frame() ggplot(CD, aes(x=sdimx,y=sdimy, color=factor(Banksy_smooth))) + geom_point(size = 0.25) + theme_void() + theme(legend.position="bottom") + facet_wrap(~ sample_id, scales = 'free') + labs(color = "", title = paste0("Banksy spatial clusters")) ``` # Session info ```{r} sessionInfo() ``` # References