You can install the latest version of spacexr from Bioconductor:
Robust Cell Type Decomposition (RCTD) is a statistical method for decomposing cell type mixtures in spatial transcriptomics data. In this vignette, we will use a simulated dataset to demonstrate how you can run RCTD on spatial transcriptomics data and visualize your results.
We will use the term pixel, synonymous with spot or bead, to refer to any observation that measures gene expression at a fixed location. We do not use the term cell because, in spatial transcriptomics data, single pixels may contain mixtures of multiple cells. The purpose of RCTD is to estimate the proportions of different cell types in each pixel of a dataset.
Before running RCTD, we must first load in:
The reference data should be stored in a SummarizedExperiment
object with:
Required components:
assay
: A matrix (or dgCMatrix
) of gene expression counts, with genes as rows and cells as columns. Row names should be unique gene names, and column names should be unique cell barcodes. Counts should be untransformed integers.
colData
: A factor column containing cell type annotations for each cell in the reference. The column name is assumed to be “cell_type”, but another name may be specified using the cell_type_col
parameter in createRctd
.
Optional components:
colData
: A numeric column named nUMI
, containing total UMI counts for each cell. If not provided, nUMI
will be calculated as the column sums of the counts matrix.In practice, your reference could be an annotated dataset from:
Here, we load counts and cell type annotations from the rctdSim
object, but in a real-world setting, the reference information may be distributed across several files or R objects.
# Load simulated data.
data("rctdSim")
# Create SummarizedExperiment.
reference_se <- SummarizedExperiment(
assays = list(counts = rctdSim$reference_counts),
colData = rctdSim$reference_cell_types
)
# Examine reference. (optional)
print(dim(assay(reference_se))) # Gene expression matrix dimensions
#> [1] 750 75
table(colData(reference_se)$cell_type) # Number of occurrences of each cell type
#>
#> ct1 ct2 ct3
#> 25 25 25
In this vignette, we have a reference dataset of 750 genes for 75 cells. We find that there are three cell types (ct1
, ct2
, and ct3
), each appearing 25 times in the reference.
Next, we will load the spatial transcriptomics data into a SpatialExperiment
object with:
Required components:
assay
: A matrix (or dgCMatrix
) of gene expression counts, with genes as rows and pixels as columns. Row names should be unique gene names, and column names should be unique pixel barcodes. Counts should be untransformed integers.
spatialCoords
: A matrix containing x and y coordinates for each pixel.
Optional components:
colData
: A numeric column named nUMI
, containing total UMI counts for each pixel. If not provided, nUMI
will be calculated as the column sums of the counts matrix.Here, we load coordinates and counts from the rctdSim
object. Again, in practice, this step will depend on how you store your spatial transcriptomics data.
# Create SpatialExperiment.
spatial_spe <- SpatialExperiment(
assay = rctdSim$spatial_rna_counts,
spatialCoords = rctdSim$spatial_rna_coords
)
# Examine pixels. (optional)
print(dim(assay(spatial_spe))) # Gene expression matrix dimensions
#> [1] 750 12
ggplot(as.data.frame(spatialCoords(spatial_spe)), aes(x, y)) +
geom_point(size = 5) +
theme(axis.text = element_blank(), axis.ticks = element_blank()
)
We find that we have a dataset of 750 genes for 12 pixels, plotted above.
The ground truth for our cell type mixtures is given in the SpatialExperiment
object rctdSim$proportions_spe
. (The ground truth has been stored in the same format as the RCTD output, which will be discussed later.) Note that this ground truth will not be known in practice—it is what we will need to estimate with RCTD.
We find that there are two distinct cell type mixtures in our dataset:
ct1
and 10% ct2
.ct1
and 40% ct2
and ct3
each.We can visualize these mixtures with the plotAllWeights
function, which plots each pixel as a pie chart of cell types. We will return to this function later to visualize our RCTD results.
The last step before running RCTD is to create an RctdConfig
object from our SpatialExperiment
and SummarizedExperiment
objects using the createRctd
function. Some important configuration options include:
max_cores
: Number of cores used for parallel processing. We recommend setting this to at least 4 to improve efficiency. If set to 1, parallel processing is not used.gene_cutoff
, fc_cutoff
: Used for filtering genes in platform effect normalization. gene_cutoff
filters for average expression and fc_cutoff
filters for log-fold-change across cell types.gene_cutoff_reg
, fc_cutoff_reg
: Similar to above, but for the RCTD step.UMI_min
, UMI_max
: Minimum and maximum read depth for pixels in the SpatialExperiment
dataset.MAX_MULTI_TYPES
: Maximum number of cell types per pixel (multi mode only).We are now ready to run RCTD using the runRctd
function.
RCTD has three modes, each suited for different spatial technologies:
"doublet"
: Assigns 1-2 cell types per pixel. Best for high-resolution technologies like Slide-seq and MERFISH."multi"
: Like doublet mode but can assign more cell types (up to MAX_MULTI_TYPES
). Best for lower-resolution technologies like 100-micron Visium."full"
: No restrictions on number of cell types.The rctd_mode
argument is used to specify the mode in which the RCTD algorithm runs. For the purpose of demonstration, we’ll decompose the data into doublets in this vignette.
RCTD returns a SpatialExperiment
object containing:
Three assays (one in full mode):
weights
: Cell type proportions restricted according to the specified mode (not available in full mode)weights_unconfident
: Cell type proportions restricted according to the specified mode, including unconfident predictions (not available in full mode)weights_full
: Unrestricted cell type proportionsAssays have cell types as rows and pixels as columns, with values representing the proportion (0 to 1) of each cell type in each pixel. Assay columns sum to 1 (except for rejected pixels, which sum to 0).
NOTE: Weights are transposed compared to the output of the original dmcable/spacexr
package on GitHub.
Spatial coordinates for each pixel stored in spatialCoords
.
Additional colData
including:
spot_class
: Classification as singlet
, doublet_certain
, doublet_uncertain
, or reject
first_type
, second_type
: Predicted cell typesmin_score
, singlet_score
for advanced userscell_type_list
: List of cell types per pixelconf_list
: List of whether cell type predictions are confidentmin_score
for advanced usersWe can view the doublet weights for our data:
# Cell type mixture 1
assay(results_spe, "weights")[, 1:6]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#> mx11 mx12 mx13 mx14 mx15 mx16
#> ct1 1 1 1 1 1 1
#> ct2 . . . . . .
#> ct3 . . . . . .
# Cell type mixture 2
assay(results_spe, "weights")[, 7:12]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#> mx21 mx22 mx23 mx24 mx25 mx26
#> ct1 . . . . . .
#> ct2 0.4935949 0.475943 0.4685428 0.4577326 0.5103718 0.4795314
#> ct3 0.5064051 0.524057 0.5314572 0.5422674 0.4896282 0.5204686
You can access the full, unrestricted weights using assay(results_spe, "weights_full")
. We will visualize these weights later. You may use assay(results_spe, "weights_unconfident")
to access the less confident predictions (e.g., the second_type
weight for singlets and uncertain doublets).
Below, we can view the results of RCTD deconvolution:
classification_df <- data.frame(
pixel = colnames(assay(results_spe)),
spot_class = colData(results_spe)$spot_class,
first_type = colData(results_spe)$first_type,
second_type = colData(results_spe)$second_type
)
classification_df
#> pixel spot_class first_type second_type
#> 1 mx11 singlet ct1 ct2
#> 2 mx12 singlet ct1 ct2
#> 3 mx13 singlet ct1 ct2
#> 4 mx14 singlet ct1 ct2
#> 5 mx15 singlet ct1 ct2
#> 6 mx16 singlet ct1 ct2
#> 7 mx21 doublet_certain ct2 ct3
#> 8 mx22 doublet_certain ct3 ct2
#> 9 mx23 doublet_certain ct3 ct2
#> 10 mx24 doublet_certain ct3 ct2
#> 11 mx25 doublet_certain ct2 ct3
#> 12 mx26 doublet_certain ct3 ct2
In particular, this data frame provides us with the following information:
spot_class
: A factor variable with the following levels of classification:
singlet
(1 cell type in the pixel)doublet_certain
(2 cell types in the pixel)doublet_uncertain
(2 cell types in the pixel, but only confident of 1)reject
(no prediction given for pixel)reject
pixels should not be used, and doublet_uncertain
pixels should only be used for applications that do not require knowledge of all cell types in a pixel.first_type
: the first cell type predicted on the bead (for all spot_class conditions except reject
).second_type
: the second cell type predicted on the bead for doublet spot_class
conditions (not a confident prediction for doublet_uncertain
).We can see that RCTD generally succeeded in identifying the components of the cell mixtures, although pixels with 90% ct1
and 10% ct2
were classified as singlets. (As an exercise, try changing the DOUBLET_THRESHOLD
parameter in createRctd
so that these mixtures will be identified as doublets.)
We can visualize the results of our RCTD run with the plotAllWeights
function. Simply pass in the RCTD results (and some aesthetic arguments), and the cell type proportions will be plotted as pie charts for each pixel.
Note that all mixtures with 90% ct1
(on the left) were classified as singlets, so only ct1
is plotted. And for the other mixtures (on the right), only cell types ct2
and ct3
are plotted (without any ct1
) due to the restrictions of doublet mode. We can visualize the full, unrestricted proportions by passing in the name of the relevant assay ("weights_full"
) to the plotAllWeights
function:
This is very close to the ground truth!
Finally, we can visualize how the proportion of a specific cell type varies across space with the plotCellTypeWeight
function. We show an example with the dummy cell type ct1
.
plotCellTypeWeight(
results_spe, "ct1", "weights_full",
size = 5, stroke = 0.5, title = "\nDensity of Cell Type 1\n"
)
You are now ready to apply RCTD to your own spatial transcriptomics datasets!
sessionInfo()
#> R Under development (unstable) (2025-02-19 r87757)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] spacexr_0.99.0 SpatialExperiment_1.17.0
#> [3] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
#> [5] Biobase_2.67.0 GenomicRanges_1.59.1
#> [7] GenomeInfoDb_1.43.4 IRanges_2.41.3
#> [9] S4Vectors_0.45.4 BiocGenerics_0.53.6
#> [11] generics_0.1.3 MatrixGenerics_1.19.1
#> [13] matrixStats_1.5.0 ggplot2_3.5.1
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 rjson_0.2.23 xfun_0.51
#> [4] bslib_0.9.0 lattice_0.22-6 quadprog_1.5-8
#> [7] vctrs_0.6.5 tools_4.5.0 yulab.utils_0.2.0
#> [10] curl_6.2.1 RSQLite_2.3.9 tibble_3.2.1
#> [13] blob_1.2.4 pkgconfig_2.0.3 Matrix_1.7-2
#> [16] dbplyr_2.5.0 lifecycle_1.0.4 GenomeInfoDbData_1.2.13
#> [19] compiler_4.5.0 farver_2.1.2 munsell_0.5.1
#> [22] ggforce_0.4.2 scatterpie_0.2.4 ggfun_0.1.8
#> [25] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
#> [28] pillar_1.10.1 crayon_1.5.3 jquerylib_0.1.4
#> [31] tidyr_1.3.1 MASS_7.3-65 DelayedArray_0.33.6
#> [34] cachem_1.1.0 magick_2.8.5 abind_1.4-8
#> [37] tidyselect_1.2.1 digest_0.6.37 purrr_1.0.4
#> [40] dplyr_1.1.4 labeling_0.4.3 polyclip_1.10-7
#> [43] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
#> [46] cli_3.6.4 SparseArray_1.7.6 magrittr_2.0.3
#> [49] S4Arrays_1.7.3 withr_3.0.2 filelock_1.0.3
#> [52] scales_1.3.0 UCSC.utils_1.3.1 bit64_4.6.0-1
#> [55] rmarkdown_2.29 XVector_0.47.2 httr_1.4.7
#> [58] bit_4.6.0 memoise_2.0.1 evaluate_1.0.3
#> [61] knitr_1.49 BiocFileCache_2.15.1 rlang_1.1.5
#> [64] Rcpp_1.0.14 DBI_1.2.3 glue_1.8.0
#> [67] tweenr_2.0.3 jsonlite_1.9.1 R6_2.6.1
#> [70] fs_1.6.5