---
title: "XeniumIO: Import 10X Genomics Xenium Analyzer Data"
author: "Marcel Ramos"
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{VisiumIO Quick Start Guide}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document:
    number_sections: no
    toc: yes
    toc_depth: 4
package: XeniumIO
---

```{r,include=FALSE}
knitr::opts_chunk$set(cache = TRUE)
```

# Introduction

The `XeniumIO` package provides functions to import 10X Genomics Xenium Analyzer
data into R. The package is designed to work with the output of the Xenium
Analyzer, which is a software tool that processes Visium spatial gene expression
data. The package provides functions to import the output of the Xenium Analyzer
into R, and to create a `TENxXenium` object that can be used with other
Bioconductor packages.

# Supported Formats

## TENxIO

The 10X suite of packages support multiple file formats. The following table
lists the supported file formats and the corresponding classes that are imported
into R.

| **Extension**       | **Class**     | **Imported as**      |
|---------------------|---------------|----------------------|
| .h5                 | TENxH5        | SingleCellExperiment w/ TENxMatrix |
| .mtx / .mtx.gz      | TENxMTX       | SummarizedExperiment w/ dgCMatrix |
| .tar.gz             | TENxFileList  | SingleCellExperiment w/ dgCMatrix |
| peak_annotation.tsv | TENxPeaks     | GRanges              |
| fragments.tsv.gz    | TENxFragments | RaggedExperiment     |
| .tsv / .tsv.gz      | TENxTSV       | tibble               |

## VisiumIO

| **Extension**       | **Class**     | **Imported as**      |
|---------------------|---------------|----------------------|
| spatial.tar.gz      | TENxSpatialList | DataFrame list *   |
| .parquet            | TENxSpatialParquet | tibble *        |

## XeniumIO

| **Extension**       | **Class**     | **Imported as**      |
|---------------------|---------------|----------------------|
| .zarr.zip           | TENxZarr      | (TBD)                |

# GitHub Installation

```{r,eval=FALSE}
BiocManager::install("Bioconductor/XeniumIO")
```

# Loading package

```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(XeniumIO)
```

# XeniumIO

The `TENxXenium` class has a `metadata` slot for the `experiment.xenium` file.
The `resources` slot is a `TENxFileList` or `TENxH5` object containing the cell
feature matrix. The `coordNames` slot is a vector specifying the names of the
columns in the spatial data containing the spatial coordinates. The `sampleId`
slot is a scalar specifying the sample identifier.

```{r, eval=FALSE}
TENxXenium(
    resources = "path/to/matrix/folder/or/file",
    xeniumOut = "path/to/xeniumOut/folder",
    sample_id = "sample01",
    format = c("mtx", "h5"),
    boundaries_format = c("parquet", "csv.gz"),
    spatialCoordsNames = c("x_centroid", "y_centroid"),
    ...
)
```

The `format` argument specifies the format of the `resources` object, either
"mtx" or "h5". The `boundaries_format` allows the user to choose whether to
read in the data using the `parquet` or `csv.gz` format.

# Example Folder Structure

Note that the `xeniumOut` unzipped folder must contain the following files:

```
    *outs
    ├── cell_feature_matrix.h5
    ├── cell_feature_matrix.tar.gz
    |   ├── barcodes.tsv*
    |   ├── features.tsv*
    |   └── matrix.mtx*
    ├── cell_feature_matrix.zarr.zip
    ├── experiment.xenium
    ├── cells.csv.gz
    ├── cells.parquet
    ├── cells.zarr.zip
    [...]
```

Note that currently the `zarr` format is not supported as the infrastructure is
currently under development.

## Xenium class

The `resources` slot should either be the `TENxFileList` from the `mtx` format or
a `TENxH5` instance from an `h5` file. The boundaries can either be a
`TENxSpatialParquet` instance or a `TENxSpatialCSV`. These classes are
automatically instantiated by the constructor function.

```{r}
showClass("TENxXenium")
```

## `import` method

The `import` method for a `TENxXenium` instance returns a `SpatialExperiment`
class object. Dispatch is only done on the `con` argument. See `?BiocIO::import`
for details on the generic. The `import` function call is meant to be a simple
call without much input. For more details in the package, see `?TENxXenium`.

```{r}
getMethod("import", c(con = "TENxXenium"))
```


# Importing an Example Xenium Dataset

The following code snippet demonstrates how to import a Xenium Analyzer output
into R. The `TENxXenium` object is created by specifying the path to the
`xeniumOut` folder. The `TENxXenium` object is then imported into R using the
`import` method for the `TENxXenium` class.

First, we cache the ~12 MB file to avoid downloading it multiple times (via 
`r Biocpkg("BiocFileCache")`).

```{r}
zipfile <- paste0(
    "https://mghp.osn.xsede.org/bir190004-bucket01/BiocXenDemo/",
    "Xenium_Prime_MultiCellSeg_Mouse_Ileum_tiny_outs.zip"
)
destfile <- XeniumIO:::.cache_url_file(zipfile)
```

We then create an output folder for the contents of the zipped file. We use the
same name as the zip file but without the extension (with
`tools::file_path_sans_ext`).

```{r}
outfold <- file.path(
    tempdir(), tools::file_path_sans_ext(basename(zipfile))
)
if (!dir.exists(outfold))
    dir.create(outfold, recursive = TRUE)
```

We now unzip the file and use the `outfold` as the `exdir` argument to `unzip`.
The `outfold` variable and folder will be used as the `xeniumOut` argument in
the `TENxXenium` constructor. Note that we use the `ref = "Gene Expression"`
argument in the `import` method to pass down to the internal `splitAltExps`
function call. This will set the `mainExpName` in the `SpatialExperiment`
object.

```{r}
unzip(
    zipfile = destfile, exdir = outfold, overwrite = FALSE
)
TENxXenium(xeniumOut = outfold) |>
    import(ref = "Gene Expression")
```

Note that you may also use the `swapAltExp` function to set a `mainExpName` in
the `SpatialExperiment` but this is not recommended. The operation returns a
`SingleCellExperiment` which has to be coerced back into a `SpatialExperiment`.
The coercion also loses some metadata information particularly the
`spatialCoords`.

```{r}
TENxXenium(xeniumOut = outfold) |>
    import() |>
    swapAltExp(name = "Gene Expression") |>
    as("SpatialExperiment")
```


The dataset was obtained from the 10X Genomics website under the
[X0A v3.0 section](https://www.10xgenomics.com/support/software/xenium-onboard-analysis/latest/resources/xenium-example-data#test-data-v3-0)
and is a subset of the Xenium Prime 5K Mouse Pan Tissue & Pathways Panel.
The link to the data can be seen as the `url` input above and shown below for
completeness.

<https://cf.10xgenomics.com/samples/xenium/3.0.0/Xenium_Prime_MultiCellSeg_Mouse_Ileum_tiny/Xenium_Prime_MultiCellSeg_Mouse_Ileum_tiny_outs.zip>

# Session Info

```{r}
sessionInfo()
```