In this vignette we use sosta
to reconstruct pancreatic islets of different diabetic stages from IMC data (Damond et al. 2019). Based on the reconstruction we calculate structure metrics. Finally, we show how to do staticstical comparison of the metrics accounting for the correlation structure of the dataset.
The sosta package can be installed from Bioconductor as follows:
if (!requireNamespace("BiocManager")) {
install.packages("BiocManager")
}
BiocManager::install("sosta")
For this vignette, we will need several additional packages:
library("dplyr")
library("ExperimentHub")
library("ggplot2")
library("lme4")
library("lmerTest")
library("sf")
library("SpatialExperiment")
library("sosta")
library("tidyr")
library("ggfortify")
theme_set(theme_bw())
In this vignette, we will use an imaging mass cytometry (IMC) dataset of pancreatic islets from human donors at different stages of type 1 diabetes (T1D) and healthy controls (Damond et al. 2019).
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 13557383 724.1 24215044 1293.3 16145384 862.3
#> Vcells 76064721 580.4 270860059 2066.5 314386559 2398.6
First, we plot the data for illustration. As we have multiple images per patient, we will subset to a few slides. As can be seen, the dimensions of the field of view are differing.
df <- cbind(
colData(spe[, spe$image_name %in% c("E04", "E03", "G01", "J02")]),
spatialCoords(spe[, spe$image_name %in% c("E04", "E03", "G01", "J02")])
)
df |>
as.data.frame() |>
ggplot(aes(x = cell_x, y = cell_y, color = cell_category)) +
geom_point(size = 0.25) +
facet_wrap(~image_name, ncol = 2) +
coord_equal() +
scale_color_brewer(palette = "Dark2")
Our goal is to reconstruct / segment and quantify the pancreatic islets.
In this example, we will reconstruct the islets based on the point pattern density of the islet cells. We will start with estimating the parameters that we use for reconstruction afterwards. For one image this can be illustrated as follows.
shapeIntensityImage(
spe,
marks = "cell_category",
imageCol = "image_name",
imageId = "G01",
markSelect = "islet"
)
We see the density (pixel-level) image on the left and a histogram of the intensity values on the right. The estimated threshold is roughly the mean between the two modes of the (truncated) pixel intensity distribution.
Note that the above calculation was done for one image. The function estimateReconstructionParametersSPE
returns two plots with the estimated parameters for each image. The parameter bndw
is the bandwidth parameter that is used for estimating the intensity profile of the point pattern. The parameter thres
is the estimated parameter for the density threshold for reconstruction. We subset 25 random images to speed up computation.
n <- estimateReconstructionParametersSPE(
spe,
marks = "cell_category",
imageCol = "image_name",
markSelect = "islet",
nImages = 25,
plotHist = TRUE
)
We can inspect the relationship of the estimated bandwidth and threshold.
n |>
ggplot(aes(x = bndw, y = thres)) +
geom_point()
We note that the estimated bandwidth seems to vary a bit more than the estimated threshold. We will use the mean of the two estimated vectors as our parameters.
(thresSPE <- mean(n$thres))
#> [1] 0.003578595
(bndwSPE <- mean(n$bndw))
#> [1] 13.4399
The function reconstructShapeDensitySPE
reconstructs the islets for all images in the spe
object. We use the estimated parameters from above. For computational reasons, we will subset to 20 images per patient for the rest of the vignette.
# Sample 15 images per patient
sel <- colData(spe) |>
as.data.frame() |>
group_by(patient_id) |>
select(image_name) |>
sample_n(size = 20, replace = FALSE) |>
pull(image_name)
#> Adding missing grouping variables: `patient_id`
# Select sampled images
speSel <- spe[, spe$image_name %in% sel]
speSel$image_name |>
unique() |>
length()
#> [1] 205
# Run on all images
allIslets <- reconstructShapeDensitySPE(
speSel,
marks = "cell_category",
imageCol = "image_name",
markSelect = "islet",
bndw = bndwSPE,
thres = thresSPE,
nCores = 1
)
The result is a simple feature collection. It contains the polygons (<GEOMETRY>
column), a structure identifier (structID
) and the image identifier (image_name
). Let’s add some patient metadata to the object.
colsKeep <- c(
"patient_stage", "tissue_slide", "tissue_region",
"patient_id", "patient_disease_duration",
"patient_age", "patient_gender", "sample_id"
)
patientData <- colData(speSel) |>
as_tibble() |>
group_by(image_name) |>
select(all_of(colsKeep)) |>
unique()
#> Adding missing grouping variables: `image_name`
allIslets <- allIslets |>
dplyr::left_join(patientData, by = "image_name")
Using standard operations on data frames we can inspect the number of structures found per patient.
allIslets |>
st_drop_geometry() |> # we are only interested in metadata
group_by(patient_id) |>
summarise(n = n()) |>
ungroup()
#> # A tibble: 12 × 2
#> patient_id n
#> <int> <int>
#> 1 6089 30
#> 2 6126 50
#> 3 6134 49
#> 4 6180 60
#> 5 6228 47
#> 6 6264 76
#> 7 6278 58
#> 8 6362 47
#> 9 6380 32
#> 10 6386 58
#> 11 6414 45
#> 12 6418 43
Now that we have islet structures for all images, we can now use the function totalShapeMetrics
to calculate a set of metrics related to the shape of the islets.
isletMetrics <- totalShapeMetrics(allIslets)
The result is a matrix. We will it to our simple feature collection.
# specify factor levels
lv <- c("Non-diabetic", "Onset", "Long-duration")
allIslets <- allIslets |>
cbind(t(isletMetrics)) |>
mutate(patient_stage = factor(patient_stage, levels = lv))
We use PCA to get an overview of the different features. Each dot represents one structure.
autoplot(
prcomp(t(isletMetrics), scale. = TRUE),
x = 1,
y = 2,
data = allIslets,
color = "patient_stage",
size = 2,
# shape = 'type',
loadings = TRUE,
loadings.colour = "steelblue3",
loadings.label = TRUE,
loadings.label.size = 3,
loadings.label.repel = TRUE,
loadings.label.colour = "black"
) +
scale_color_brewer(palette = "Set2") +
theme_bw() +
coord_fixed()
We can use boxplots to investigate differences of shape metrics between patient stages. We will subset to a few metrics that are not colinear in the PCA plot. Note that the boxplots don’t reveal patient specific effects.
allIslets |>
sf::st_drop_geometry() |>
select(patient_stage, rownames(isletMetrics)) |>
pivot_longer(-patient_stage) |>
filter(name %in% c("Area", "Compactness", "Curl")) |>
ggplot(aes(x = patient_stage, y = value, fill = patient_stage)) +
geom_boxplot() +
facet_wrap(~name, scales = "free") +
scale_fill_brewer(palette = "Set2") +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
guides(fill = "none")
As the distribution of the area seems to be skewed we use a log transformation. Let’s have a look at the log-transformed area of the islets, faceted by patient.
allIslets |>
sf::st_drop_geometry() |>
select(patient_stage, patient_id, rownames(isletMetrics)) |>
pivot_longer(-c(patient_stage, patient_id)) |>
filter(name %in% c("Area")) |>
ggplot(aes(
x = as.factor(patient_id),
y = log(value),
fill = patient_stage
)) +
geom_boxplot() +
facet_wrap(~patient_stage, scales = "free") +
scale_fill_brewer(palette = "Set2") +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
guides(fill = "none")
We can see quite some variability within the patients and the different stages. As the individual structure level metrics are not independent we have to account for dependence between measurements. This dependence can lie on the level of the patient and the slide as we have repeated measurements for each level.
To account for this, we will use mixed linear models with random effects for the patient and the individual slides (image name). We will use the lme4 package for fitting linear mixed effects models (Bates et al. 2015) and lmerTest for p-value calculation (Kuznetsova, Brockhoff, and Christensen 2017).
We can model differences between the patient stages as follows.
mod <- lmer(
log(Area) ~ patient_stage +
(1 | patient_id) + (1 | image_name),
data = allIslets
)
summary(mod)
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: log(Area) ~ patient_stage + (1 | patient_id) + (1 | image_name)
#> Data: allIslets
#>
#> REML criterion at convergence: 2345.2
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.7625 -0.4564 0.1427 0.6511 2.1559
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> image_name (Intercept) 0.093952 0.30652
#> patient_id (Intercept) 0.003495 0.05912
#> Residual 2.904282 1.70420
#> Number of obs: 595, groups: image_name, 205; patient_id, 12
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 8.41670 0.12838 6.96108 65.562 5.62e-11 ***
#> patient_stageOnset 0.01989 0.19009 8.26100 0.105 0.9192
#> patient_stageLong-duration -0.61419 0.18252 6.58001 -3.365 0.0132 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr) ptnt_O
#> ptnt_stgOns -0.675
#> ptnt_stgLn- -0.703 0.475
As we can see in the fixed effects section in summary(mod)
there is a significant difference in log-transformed islet area of long-duration patients with respect to non-diabetic patients, accounting for correlation on the patient and image level. Note that this calculation was done on a random subset of the patient slides only.
sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] imcdatasets_1.15.0 cytomapper_1.19.0
#> [3] EBImage_4.49.0 ggfortify_0.4.17
#> [5] tidyr_1.3.1 sosta_0.99.6
#> [7] SpatialExperiment_1.17.0 SingleCellExperiment_1.29.2
#> [9] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [11] GenomicRanges_1.59.1 GenomeInfoDb_1.43.4
#> [13] IRanges_2.41.3 S4Vectors_0.45.4
#> [15] MatrixGenerics_1.19.1 matrixStats_1.5.0
#> [17] sf_1.0-20 lmerTest_3.1-3
#> [19] lme4_1.1-37 Matrix_1.7-3
#> [21] ggplot2_3.5.1 ExperimentHub_2.15.0
#> [23] AnnotationHub_3.15.0 BiocFileCache_2.15.1
#> [25] dbplyr_2.5.0 BiocGenerics_0.53.6
#> [27] generics_0.1.3 dplyr_1.1.4
#> [29] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.6.0 later_1.4.1 bitops_1.0-9
#> [4] filelock_1.0.3 tibble_3.2.1 svgPanZoom_0.3.4
#> [7] polyclip_1.10-7 lifecycle_1.0.4 Rdpack_2.6.3
#> [10] lattice_0.22-7 MASS_7.3-65 magrittr_2.0.3
#> [13] sass_0.4.9 rmarkdown_2.29 jquerylib_0.1.4
#> [16] yaml_2.3.10 httpuv_1.6.15 sp_2.2-0
#> [19] spatstat.sparse_3.1-0 DBI_1.2.3 minqa_1.2.8
#> [22] RColorBrewer_1.1-3 abind_1.4-8 purrr_1.0.4
#> [25] RCurl_1.98-1.17 rappdirs_0.3.3 GenomeInfoDbData_1.2.14
#> [28] ggrepel_0.9.6 spatstat.utils_3.1-3 terra_1.8-42
#> [31] units_0.8-7 goftest_1.2-3 spatstat.random_3.3-3
#> [34] svglite_2.1.3 codetools_0.2-20 DelayedArray_0.33.6
#> [37] tidyselect_1.2.1 raster_3.6-32 UCSC.utils_1.3.1
#> [40] farver_2.1.2 viridis_0.6.5 spatstat.explore_3.4-2
#> [43] jsonlite_2.0.0 e1071_1.7-16 systemfonts_1.2.2
#> [46] smoothr_1.0.1 tools_4.6.0 Rcpp_1.0.14
#> [49] glue_1.8.0 gridExtra_2.3 SparseArray_1.7.7
#> [52] xfun_0.52 HDF5Array_1.35.16 shinydashboard_0.7.2
#> [55] withr_3.0.2 numDeriv_2016.8-1.1 BiocManager_1.30.25
#> [58] fastmap_1.2.0 boot_1.3-31 rhdf5filters_1.19.2
#> [61] digest_0.6.37 R6_2.6.1 mime_0.13
#> [64] colorspace_2.1-1 tensor_1.5 jpeg_0.1-11
#> [67] spatstat.data_3.1-6 RSQLite_2.3.9 h5mread_0.99.4
#> [70] class_7.3-23 httr_1.4.7 htmlwidgets_1.6.4
#> [73] S4Arrays_1.7.3 pkgconfig_2.0.3 gtable_0.3.6
#> [76] blob_1.2.4 XVector_0.47.2 htmltools_0.5.8.1
#> [79] bookdown_0.42 fftwtools_0.9-11 scales_1.3.0
#> [82] png_0.1-8 spatstat.univar_3.1-2 reformulas_0.4.0
#> [85] knitr_1.50 rjson_0.2.23 nlme_3.1-168
#> [88] curl_6.2.2 nloptr_2.2.1 proxy_0.4-27
#> [91] cachem_1.1.0 rhdf5_2.51.2 stringr_1.5.1
#> [94] BiocVersion_3.21.1 KernSmooth_2.23-26 parallel_4.6.0
#> [97] vipor_0.4.7 AnnotationDbi_1.69.0 pillar_1.10.2
#> [100] grid_4.6.0 vctrs_0.6.5 promises_1.3.2
#> [103] xtable_1.8-4 beeswarm_0.4.0 evaluate_1.0.3
#> [106] tinytex_0.56 magick_2.8.6 cli_3.6.4
#> [109] locfit_1.5-9.12 compiler_4.6.0 rlang_1.1.5
#> [112] crayon_1.5.3 labeling_0.4.3 classInt_0.4-11
#> [115] ggbeeswarm_0.7.2 stringi_1.8.7 viridisLite_0.4.2
#> [118] deldir_2.0-4 BiocParallel_1.41.5 nnls_1.6
#> [121] munsell_0.5.1 Biostrings_2.75.4 tiff_0.1-12
#> [124] spatstat.geom_3.3-6 patchwork_1.3.0 bit64_4.6.0-1
#> [127] Rhdf5lib_1.29.2 KEGGREST_1.47.0 shiny_1.10.0
#> [130] rbibutils_2.3 memoise_2.0.1 bslib_0.9.0
#> [133] bit_4.6.0
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67 (1). https://doi.org/10.18637/jss.v067.i01.
Damond, Nicolas, Stefanie Engler, Vito R. T. Zanotelli, Denis Schapiro, Clive H. Wasserfall, Irina Kusmartseva, Harry S. Nick, et al. 2019. “A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry.” Cell Metabolism 29 (3): 755–768.e5. https://doi.org/10.1016/j.cmet.2018.11.014.
Kuznetsova, Alexandra, Per B. Brockhoff, and Rune H. B. Christensen. 2017. “lmerTest Package: Tests in Linear Mixed Effects Models.” Journal of Statistical Software 82 (December): 1–26. https://doi.org/10.18637/jss.v082.i13.