TileDBArray 1.17.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.49522863 -1.92972564 -1.01255162 . -0.02680271 0.87804803
## [2,] 1.64315423 -0.33276852 0.80125870 . 2.10078924 -1.62010330
## [3,] 0.03280767 0.84771355 0.16792539 . 1.57256096 -0.56428351
## [4,] -0.08093197 -2.41061120 0.01907238 . 1.34835488 1.76170650
## [5,] -0.71764457 -1.33009187 0.72063298 . 0.35104613 0.10222736
## ... . . . . . .
## [96,] -1.078065373 0.745097852 -0.007153808 . 1.17293567 0.84268141
## [97,] -0.019988814 0.499899076 1.561894339 . 0.56777599 -0.16209346
## [98,] -0.601884316 -1.281553112 -2.034203119 . -0.22069551 0.23064454
## [99,] -2.332355226 0.133955075 0.775844807 . 0.89013038 -0.06182221
## [100,] 0.256823916 0.052258591 -0.115419923 . -0.28693341 0.66102959
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.49522863 -1.92972564 -1.01255162 . -0.02680271 0.87804803
## [2,] 1.64315423 -0.33276852 0.80125870 . 2.10078924 -1.62010330
## [3,] 0.03280767 0.84771355 0.16792539 . 1.57256096 -0.56428351
## [4,] -0.08093197 -2.41061120 0.01907238 . 1.34835488 1.76170650
## [5,] -0.71764457 -1.33009187 0.72063298 . 0.35104613 0.10222736
## ... . . . . . .
## [96,] -1.078065373 0.745097852 -0.007153808 . 1.17293567 0.84268141
## [97,] -0.019988814 0.499899076 1.561894339 . 0.56777599 -0.16209346
## [98,] -0.601884316 -1.281553112 -2.034203119 . -0.22069551 0.23064454
## [99,] -2.332355226 0.133955075 0.775844807 . 0.89013038 -0.06182221
## [100,] 0.256823916 0.052258591 -0.115419923 . -0.28693341 0.66102959
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0.00 0.00
## [2,] 0 0 0 . 0.00 0.00
## [3,] 0 0 0 . 0.13 0.00
## [4,] 0 0 0 . 0.00 0.00
## [5,] 0 0 0 . 0.00 0.00
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . TRUE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.49522863 -1.92972564 -1.01255162 . -0.02680271 0.87804803
## GENE_2 1.64315423 -0.33276852 0.80125870 . 2.10078924 -1.62010330
## GENE_3 0.03280767 0.84771355 0.16792539 . 1.57256096 -0.56428351
## GENE_4 -0.08093197 -2.41061120 0.01907238 . 1.34835488 1.76170650
## GENE_5 -0.71764457 -1.33009187 0.72063298 . 0.35104613 0.10222736
## ... . . . . . .
## GENE_96 -1.078065373 0.745097852 -0.007153808 . 1.17293567 0.84268141
## GENE_97 -0.019988814 0.499899076 1.561894339 . 0.56777599 -0.16209346
## GENE_98 -0.601884316 -1.281553112 -2.034203119 . -0.22069551 0.23064454
## GENE_99 -2.332355226 0.133955075 0.775844807 . 0.89013038 -0.06182221
## GENE_100 0.256823916 0.052258591 -0.115419923 . -0.28693341 0.66102959
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.49522863 1.64315423 0.03280767 -0.08093197 -0.71764457 -0.37595237
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.49522863 -1.92972564 -1.01255162 -0.97850309 -1.03723647
## GENE_2 1.64315423 -0.33276852 0.80125870 0.92418158 -0.09196941
## GENE_3 0.03280767 0.84771355 0.16792539 -0.22424218 1.35002031
## GENE_4 -0.08093197 -2.41061120 0.01907238 2.17484563 -0.23067129
## GENE_5 -0.71764457 -1.33009187 0.72063298 0.05066610 -0.01141183
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.99045726 -3.85945128 -2.02510325 . -0.05360542 1.75609605
## GENE_2 3.28630847 -0.66553703 1.60251740 . 4.20157847 -3.24020659
## GENE_3 0.06561534 1.69542711 0.33585077 . 3.14512191 -1.12856701
## GENE_4 -0.16186395 -4.82122241 0.03814476 . 2.69670976 3.52341300
## GENE_5 -1.43528914 -2.66018373 1.44126597 . 0.70209226 0.20445472
## ... . . . . . .
## GENE_96 -2.15613075 1.49019570 -0.01430762 . 2.3458713 1.6853628
## GENE_97 -0.03997763 0.99979815 3.12378868 . 1.1355520 -0.3241869
## GENE_98 -1.20376863 -2.56310622 -4.06840624 . -0.4413910 0.4612891
## GENE_99 -4.66471045 0.26791015 1.55168961 . 1.7802608 -0.1236444
## GENE_100 0.51364783 0.10451718 -0.23083985 . -0.5738668 1.3220592
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -11.514950 -17.006339 -2.767107 -2.710456 -20.369979 4.945180 -12.692401
## SAMP_8 SAMP_9 SAMP_10
## -10.457666 1.569292 15.438382
out %*% runif(ncol(out))
## [,1]
## GENE_1 -1.53175166
## GENE_2 0.60078922
## GENE_3 0.66652384
## GENE_4 0.08680119
## GENE_5 0.81197635
## GENE_6 -0.37790261
## GENE_7 0.62644117
## GENE_8 1.49560672
## GENE_9 -1.66954304
## GENE_10 1.16939034
## GENE_11 0.48794408
## GENE_12 0.51118583
## GENE_13 1.77842803
## GENE_14 -1.17693076
## GENE_15 2.39228076
## GENE_16 -2.08322236
## GENE_17 0.95797356
## GENE_18 -1.51034081
## GENE_19 0.06539065
## GENE_20 0.46136117
## GENE_21 -0.92496372
## GENE_22 1.08694782
## GENE_23 -1.61223891
## GENE_24 -2.43150666
## GENE_25 -0.53194172
## GENE_26 -0.52224811
## GENE_27 1.21348609
## GENE_28 -2.12962835
## GENE_29 -0.48637711
## GENE_30 -3.65535499
## GENE_31 -2.53235563
## GENE_32 0.48401672
## GENE_33 -2.14489459
## GENE_34 -0.11854387
## GENE_35 1.39480776
## GENE_36 -1.03019165
## GENE_37 3.09419518
## GENE_38 -3.10263384
## GENE_39 -0.95270231
## GENE_40 -0.50732378
## GENE_41 -0.17088223
## GENE_42 0.12071105
## GENE_43 2.34022223
## GENE_44 1.06442172
## GENE_45 2.33211496
## GENE_46 -1.81189370
## GENE_47 -3.90899271
## GENE_48 1.41019778
## GENE_49 -1.02833710
## GENE_50 -0.28586915
## GENE_51 -0.04356341
## GENE_52 2.02819840
## GENE_53 -0.51859507
## GENE_54 -1.08165530
## GENE_55 0.45317164
## GENE_56 0.96598304
## GENE_57 -2.02898125
## GENE_58 -0.60001883
## GENE_59 -1.00756268
## GENE_60 -1.31339060
## GENE_61 1.48610465
## GENE_62 -2.39684225
## GENE_63 -0.58177503
## GENE_64 -1.51817208
## GENE_65 2.18729417
## GENE_66 -0.65530307
## GENE_67 0.77501280
## GENE_68 -1.47348445
## GENE_69 -0.15279716
## GENE_70 1.64026543
## GENE_71 -5.63813991
## GENE_72 -2.55494196
## GENE_73 0.12344331
## GENE_74 0.31355867
## GENE_75 1.04865940
## GENE_76 1.01743318
## GENE_77 -0.56503652
## GENE_78 0.27927815
## GENE_79 -0.32347585
## GENE_80 1.70822963
## GENE_81 1.89636200
## GENE_82 -1.55550134
## GENE_83 -1.27431055
## GENE_84 1.32399805
## GENE_85 -0.79038424
## GENE_86 1.98282343
## GENE_87 1.03819681
## GENE_88 1.21687702
## GENE_89 -0.49782044
## GENE_90 -0.54654576
## GENE_91 -0.80545681
## GENE_92 1.07134323
## GENE_93 -0.04304612
## GENE_94 -1.15445371
## GENE_95 -2.78863175
## GENE_96 2.52036489
## GENE_97 2.21691142
## GENE_98 -2.08646057
## GENE_99 -2.46358332
## GENE_100 -1.04319274
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.4659428 -0.7204590 -0.1532504 . 0.13406201 0.83150496
## [2,] -0.1861930 -1.7174010 0.1741625 . 1.13190808 1.16028402
## [3,] 1.2417720 0.4683539 -0.3362638 . 0.98172925 2.06908589
## [4,] -0.1846336 0.8535515 1.2452118 . -0.54835052 -0.09715099
## [5,] -1.0860467 -0.8984136 0.3871414 . 0.57637441 0.61378740
## ... . . . . . .
## [96,] -0.62373254 -0.20509240 -0.65769230 . 0.62204341 1.59703081
## [97,] 0.58047036 -1.47624684 -0.03268674 . 0.14305366 1.98072098
## [98,] -0.09050980 1.36453083 -0.76331046 . -0.01182675 -1.05744634
## [99,] -0.94125948 0.05371799 0.57758934 . 0.53329270 -0.76569984
## [100,] 1.35219524 1.41507564 1.18177904 . -0.56399485 -0.74465225
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.4659428 -0.7204590 -0.1532504 . 0.13406201 0.83150496
## [2,] -0.1861930 -1.7174010 0.1741625 . 1.13190808 1.16028402
## [3,] 1.2417720 0.4683539 -0.3362638 . 0.98172925 2.06908589
## [4,] -0.1846336 0.8535515 1.2452118 . -0.54835052 -0.09715099
## [5,] -1.0860467 -0.8984136 0.3871414 . 0.57637441 0.61378740
## ... . . . . . .
## [96,] -0.62373254 -0.20509240 -0.65769230 . 0.62204341 1.59703081
## [97,] 0.58047036 -1.47624684 -0.03268674 . 0.14305366 1.98072098
## [98,] -0.09050980 1.36453083 -0.76331046 . -0.01182675 -1.05744634
## [99,] -0.94125948 0.05371799 0.57758934 . 0.53329270 -0.76569984
## [100,] 1.35219524 1.41507564 1.18177904 . -0.56399485 -0.74465225
sessionInfo()
## R Under development (unstable) (2025-03-01 r87860 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
## LAPACK version 3.12.0
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.20 TileDBArray_1.17.0 DelayedArray_0.33.6
## [4] SparseArray_1.7.7 S4Arrays_1.7.3 IRanges_2.41.3
## [7] abind_1.4-8 S4Vectors_0.45.4 MatrixGenerics_1.19.1
## [10] matrixStats_1.5.0 BiocGenerics_0.53.6 generics_0.1.3
## [13] Matrix_1.7-3 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_1.9.1 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.47.2 tiledb_0.30.2
## [16] knitr_1.50 bookdown_0.42 bslib_0.9.0
## [19] rlang_1.1.5 cachem_1.1.0 xfun_0.51
## [22] sass_0.4.9 bit64_4.6.0-1 cli_3.6.4
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.0 evaluate_1.0.3
## [31] nanotime_0.3.11 zoo_1.8-13 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1