RFLOMICS 0.99.7
RFLOMICS is a R package that provides a complete workflow for the analysis
of multi-omics data within complex experimental designs framework.
The package was originally designed to be used through a user interface
(see RFLOMICS
vignette), but has been developed in an object oriented
programming with a limited number of classes and methods.
This allow to run data analysis as command line in a workflow with the “|>”
operator, each step being a method of the extended
MultiAssayExperiment
or SummarizedExperiment
classes.
To ensure efficient data management the
MultiAssayExperiment
and SummarizedExperiment
classes were extended to fix an organization for the metadata slot.
The statistical design and analysis parameters of the multi-omics data integration
are stored in the metadata
slot of the RflomicsMAE
class, while the parameters
and results of the single-omics are stored in the metadata
slot of RflomicsSE
objects of the ExperimentList
.
This organization can be illustrated by presenting the workflow of the RFLOMICS package.
The workflow is structured in three main parts:
Fig 1: Organization of the metadata slot of the RflomicsMAE object
The second part allows a complete single-omics analysis for each dataset. The raw and processed omic datasets are stored in the ExperimentList as RflomicsSE. For each analysis step, settings and results are stored (in green in Fig 2)
The DataProcessing list in the metadata slot contains the parameters and results of pre-processing including filtering, normalization and transformation
The DiffExpAnal list in the metadata slot contains the parameters and results of the differential expression analysis.
The CoExpAnal list in the metadata slot stores the parameters and results of the coexpression analysis on differentially expressed entities.
DiffExpEnrichAnal and CoExpEnrichAnal lists in the metadata slot store the results and parameters of the functional enrichment analysis for the differential expression and the coexpression analysis, respectively.
Fig 2: Organization of the metadata slot of the RflomicsSE objects
Fig 3: storage of multi-omics integration settings and results in the metadata slot of the RflomicsMAE object
As you will see, this analysis workflow is implemented as methods of the two
classes (RflomicsMAE
and RflomicsSE
), providing a generic interface whatever
the type of omics data being analyzed (RNAseq, proteomics, metabolomics, etc.).
These methods return the same class as the input object
(RflomicsMAE
and RflomicsSE
) with the metadata slot enriched with parameters
and results. Getter and Setter methods have been defined to access the metadata
objects.
In this example we will instantiate a RflomicsMAE object from an existing MAE
object and a design data.frame. We will use the ecoseed dataset from the RFLOMICS
package (see data(ecoseed)
).
ecoseed
is a multi-omics dataset composed of three data matrices:
transcriptomics (raw RNAseq read count data matrix), metabolomics and proteomics
(relative abundance matrix as XIC) obtained from a study examining the effect of
seed production temperature on the germination potential of Arabidopsis thaliana.
The experimental design includes three factors:
- Repeat: batch factor: rep1, rep2, rep3
- temperature: biological factor with three conditions: Low, Medium, Elevated
- imbibition: biological factor with three conditions: DS (Dry Deed), EI (Early Imbibition), LI (Late Imbibition)
Crucial information is needed to complete the design and create the RflomicsMAE object. Experimental factors must be defined with their type (batch or biological) and a reference level. These information are used to automatically generate the statistical design.
PCA on raw data is performed during the instantiation of the RflomicsMAE object.
data(ecoseed.mae)
factorInfo <- data.frame(
"factorName" = c("Repeat", "temperature", "imbibition"),
"factorType" = c("batch", "Bio", "Bio"),
"factorRef" = c("rep1", "Low", "DS")
)
# create rflomicsMAE object with ecoseed data
MAE <- createRflomicsMAE(
projectName = "Tests",
omicsData = ecoseed.mae,
omicsTypes = c("RNAseq", "proteomics", "metabolomics"),
factorInfo = factorInfo
)
names(metadata(MAE))
#> [1] "omicList" "projectName" "design"
#> [4] "IntegrationAnalysis" "date" "sessionInfo"
#> [7] "rflomicsVersion"
head(getDesignMat(MAE))
#> Repeat groups temperature imbibition samples
#> Low_DS_1 rep1 Low_DS Low DS Low_DS_1
#> Low_DS_2 rep2 Low_DS Low DS Low_DS_2
#> Low_DS_3 rep3 Low_DS Low DS Low_DS_3
#> Medium_DS_1 rep1 Medium_DS Medium DS Medium_DS_1
#> Medium_DS_2 rep2 Medium_DS Medium DS Medium_DS_2
#> Medium_DS_3 rep3 Medium_DS Medium DS Medium_DS_3
Omics data is heterogeneous and each omics requires specific methods and
parameters to take into account of the nature of data. RFLOMICS has a set
of generic methods than can be applied to the RflomicsMAE
or RflomicsSE
objects and which adapt the methods and parameters to the type of omics data:
RNAseq, proteomics, metabolomics.
The main methods are:
Methods | Description |
---|---|
createRflomicsMAE |
instantiate a RflomicsMAE object from a MAE object and a design data.frame |
runDataProcessing |
Process data: filtering: features or samples, normalization and/or transformation, PCA |
runDiffAnalysis |
perform a Differential features analysis on omics data |
runCoExpression |
perform a CoExpression analysis from the union/intersection of differentially expressed features list. |
runAnnotationEnrichment |
perform an annotation enrichment analysis (GO, KEGG, custom) on the list of differentially or co-expressed features |
runOmicsIntegration |
perform an integration analysis (MOFA or DIABLO) |
generateReport |
generate a report from the RflomicsMAE object and optionally export the report and results |
Table 1: Main methods of the RflomicsMAE and RflomicsSE classes for running a multi-omics analysis workflow.
When applied to a RflomicsMAE
object, the name of the RflomicsSE
object must
be specified with the SE.name
parameter.
The next step is to define the model and associated contrasts by translating the
biological question into statistical terms. The generateModelFormulae
function
is used to find all possible models given the experimental design settings.
Rflomics accepts up to three biological factors and one batch effect. Only second-order interaction terms between biological factors will appear in the model formulae. Batch factors will never appear in the interaction terms. For the rest of the example, we choose the second formula. The model without the interaction term.
form <- generateModelFormulae(MAE)
form
#> $`~Repeat + temperature + imbibition + temperature:imbibition`
#> ~Repeat + temperature + imbibition + temperature:imbibition
#> <environment: 0x5a14db2f23d0>
#>
#> $`~Repeat + temperature + imbibition`
#> ~Repeat + temperature + imbibition
#> <environment: 0x5a14db2f23d0>
MAE <- setModelFormula(MAE, form[2])
getModelFormula(MAE)
#> [1] "~Repeat + temperature + imbibition"
Given the formula chosen, all the hypotheses (contrasts) can be obtained using
the generateExpressionContrast
function.
Three types of contrast are expressed: pairwise comparison, average expression
and interaction expression.
In this example, we will select the first three contrasts from the averaged contrasts.
possibleContrasts <- generateExpressionContrast(MAE)
possibleContrasts$averaged$contrastName
#> [1] "(temperatureMedium - temperatureLow) in mean"
#> [2] "(temperatureElevated - temperatureLow) in mean"
#> [3] "(temperatureElevated - temperatureMedium) in mean"
#> [4] "(imbibitionEI - imbibitionDS) in mean"
#> [5] "(imbibitionLI - imbibitionDS) in mean"
#> [6] "(imbibitionLI - imbibitionEI) in mean"
MAE <- setSelectedContrasts(MAE, contrastList = possibleContrasts$averaged[1:3])
getSelectedContrasts(MAE)[, -c(1, 3)]
#> contrastName type tag
#> <char> <char> <char>
#> 1: (temperatureMedium - temperatureLow) in mean mean H1
#> 2: (temperatureElevated - temperatureLow) in mean mean H2
#> 3: (temperatureElevated - temperatureMedium) in mean mean H3
This statistical framework will be applied on all loaded datasets.
Data processing, including normalization and transformation parameters and
results, can be carried out in a single step using the runDataProcessing
method. Or in several steps using the methods called by runDataProcessing
,
which are:filterLowAbundance
(RNAseq
), runNormalization
and runTransformData
.
In this example, the three omics data sets are processed in a single step.
MAE <- runDataProcessing(
object = MAE,
SE.name = "RNAtest",
samples = colnames(MAE[["RNAtest"]])[-1],
filterStrategy = "NbReplicates",
cpmCutoff = 1,
normMethod = "TMM"
) |>
runDataProcessing(
SE.name = "metatest", normMethod = "median",
transformMethod = "log2"
) |>
runDataProcessing(
SE.name = "protetest", normMethod = "none",
transformMethod = "log2"
) |>
runOmicsPCA(SE.name = "metatest")
# Access to the normalization settings for metabolomics data
getNormSettings(object = MAE[["metatest"]])
#> $method
#> [1] "median"
#>
#> $suppInfo
#> NULL
# Obtain the list of filtered features for the RNAseq data
getFilteredFeatures(object = MAE[["RNAtest"]])[1:10]
#> [1] "AT1G07680" "AT1G11920" "AT1G23210" "AT1G31670" "AT1G33540" "AT1G34510"
#> [7] "AT1G44050" "AT1G47600" "AT1G50060" "AT1G52050"
It is important to note that none of the data is directly transformed in the
assay, unless the user specifically requests it by setting the modifyAssay
argument to TRUE. This way, you can always change your mind and test several
normalisation or transformation processes.
You can access to processing results and plot them.
plotOmicsPCA(MAE[["metatest"]], raw = TRUE)
plotOmicsPCA(MAE[["metatest"]], raw = FALSE)
plotDataDistribution(MAE[["metatest"]], raw = TRUE)
plotDataDistribution(MAE[["metatest"]], raw = FALSE)
Differential analysis on each contrasts can be performed using the runDiffAnalysis
method. This method is specialized for each omics type and uses the limma
package
for proteomics and metabolomics data and the edgeR
package for RNAseq data.
The method returns the same object with the metadata slot enriched with the
parameters and results of the analysis.
MAE <- runDiffAnalysis(MAE,
SE.name = "RNAtest", p.adj.method = "BH",
method = "edgeRglmfit", p.adj.cutoff = 0.05, logFC.cutoff = 0
) |>
runDiffAnalysis(
SE.name = "protetest", p.adj.method = "BH",
method = "limmalmFit", p.adj.cutoff = 0.05, logFC.cutoff = 0
) |>
runDiffAnalysis(
SE.name = "metatest", p.adj.method = "BH",
method = "limmalmFit", p.adj.cutoff = 0.05, logFC.cutoff = 0
)
# access to the settings
getDiffSettings(MAE, SE.name = "RNAtest")
#> $method
#> [1] "edgeRglmfit"
#>
#> $p.adj.method
#> [1] "BH"
#>
#> $p.adj.cutoff
#> [1] 0.05
#>
#> $abs.logFC.cutoff
#> [1] 0
#>
#> $contrastCoef
#> Intercept Repeatrep2
#> (temperatureMedium - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureMedium) in mean 0 0
#> Repeatrep3 temperatureMedium
#> (temperatureMedium - temperatureLow) in mean 0 1
#> (temperatureElevated - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureMedium) in mean 0 -1
#> temperatureElevated
#> (temperatureMedium - temperatureLow) in mean 0
#> (temperatureElevated - temperatureLow) in mean 1
#> (temperatureElevated - temperatureMedium) in mean 1
#> imbibitionEI imbibitionLI
#> (temperatureMedium - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureMedium) in mean 0 0
# Summary of the differential analysis
getDiffStat(MAE[["RNAtest"]])
#> All Up Down
#> (temperatureMedium - temperatureLow) in mean 4364 2099 2265
#> (temperatureElevated - temperatureLow) in mean 9247 4637 4610
#> (temperatureElevated - temperatureMedium) in mean 3655 1887 1768
plotDiffAnalysis(MAE,
SE.name = "RNAtest",
contrastName = "(temperatureMedium - temperatureLow) in mean",
typeofplots = "volcano"
)
#> $Volcano.plot
plotBoxplotDE(MAE[["RNAtest"]],
feature = "AT4G04810"
)
Co-expression analysis on the list of differentially expressed features can be
performed using the runCoExpression
method. This method uses methods from
the coseq
package with a set of optimal parameters tailored to each omics.
In this example, we will perform a co-expression analysis on the proteomics
data.
The results can be plot using the getCoExpAnalysesSummary
method.
A co-expression profile can be plotted for a given cluster using the
plotCoExpressionProfile
function.
MAE <- runCoExpression(MAE,
SE.name = "protetest",
K = 2:5, replicates = 5,
merge = "union"
)
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
getCoExpAnalysesSummary(MAE)
plotCoExpression(MAE[["protetest"]])$ICL
plotCoExpressionProfile(MAE[["protetest"]], cluster = 2)
Annotation enrichment analysis can be performed using the runAnnotationEnrichment
method.
This method uses the clusterProfiler
package. In this example, the org.At.tair.db
database for Arabidopsis thaliana will be used to perform the GO enrichment analysis
of the DE protein list.
The results can be viewed using the getEnrichRes
method, a summary of the
number of enriched terms by the sumORA
function and plotted, by example,
using the plotEnrichComp
method.
MAE <- runAnnotationEnrichment(MAE,
SE.name = "protetest",
OrgDb = "org.At.tair.db",
keyType = "TAIR",
pvalueCutoff = 0.05,
from = "DiffExp",
database = "GO",
domain = "CC"
)
results <- getEnrichRes(MAE[["protetest"]],
from = "DiffExp", database = "GO"
)
sumORA(MAE[["protetest"]], from = "DiffExp", database = "GO")
#> Contrast
#> (temperatureMedium - temperatureLow) in mean (temperatureMedium - temperatureLow) in mean
#> (temperatureElevated - temperatureLow) in mean (temperatureElevated - temperatureLow) in mean
#> (temperatureElevated - temperatureMedium) in mean (temperatureElevated - temperatureMedium) in mean
#> CC
#> (temperatureMedium - temperatureLow) in mean 0
#> (temperatureElevated - temperatureLow) in mean 1
#> (temperatureElevated - temperatureMedium) in mean 1
plotEnrichComp(MAE[["protetest"]],
from = "DiffExp", database = "GO",
matrixType = "FC"
)
The multi-omics step in RFLOMICS takes two steps: the preparation of the object,
based on the user-selected method, and the actual run of the selected method.
This is done using prepareForIntegration
and runOmicsIntegration
.
In this example, we will use only the proteomics and metabolomics dataset
with the mixOmics package (block.plsda function).
prepareForIntegration
does not return a RflomicsMAE object, it returns the
corresponding type to be an input for either MOFA2 or mixOmics functions.
listIntegration <- prepareForIntegration(MAE,
omicsNames = c("protetest", "metatest"),
variableLists = rownames(MAE),
method = "mixOmics"
)
MAE <- runOmicsIntegration(
object = MAE,
preparedObject = listIntegration,
method = "mixOmics",
selectedResponse = "temperature",
ncomp = 3
)
Visualizing results is made through the original package functions.
mixOmics::plotIndiv(getMixOmics(MAE, response = "temperature"))
The generateReport
function is used to generate a report from a RflomicsMAE object.
The report is generated in HTML format.
The MAE object can be saved (export = TRUE), exporting the object in a .RData, the report in html format, the graphs in png format and the data in txt (tabulated) format.
It’s use is straightforward, using the generateReport(object = MAE, fileName = "ecoseed_report.html", export = FALSE)
command.
It can take a long time, depending on the amount of analyses performed and the data size.
sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] RFLOMICS_0.99.7 coseq_1.31.0
#> [3] knitr_1.50 htmltools_0.5.8.1
#> [5] ggplot2_3.5.1 dplyr_1.1.4
#> [7] shinyBS_0.61.1 MultiAssayExperiment_1.33.9
#> [9] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [11] GenomicRanges_1.59.1 GenomeInfoDb_1.43.4
#> [13] IRanges_2.41.3 S4Vectors_0.45.4
#> [15] BiocGenerics_0.53.6 generics_0.1.3
#> [17] MatrixGenerics_1.19.1 matrixStats_1.5.0
#> [19] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.6.5 enrichplot_1.27.4 httr_1.4.7
#> [4] RColorBrewer_1.1-3 doParallel_1.0.17 tools_4.6.0
#> [7] backports_1.5.0 R6_2.6.1 DT_0.33
#> [10] HDF5Array_1.35.16 lazyeval_0.2.2 uwot_0.2.3
#> [13] rhdf5filters_1.19.2 GetoptLong_1.0.5 withr_3.0.2
#> [16] gridExtra_2.3 bayesm_3.1-6 cli_3.6.4
#> [19] Cairo_1.6-2 flashClust_1.01-2 sandwich_3.1-1
#> [22] labeling_0.4.3 sass_0.4.9 mvtnorm_1.3-3
#> [25] robustbase_0.99-4-1 proxy_0.4-27 yulab.utils_0.2.0
#> [28] gson_0.1.0 DOSE_4.1.0 R.utils_2.13.0
#> [31] plotrix_3.8-4 limma_3.63.11 org.At.tair.db_3.21.0
#> [34] RSQLite_2.3.9 gridGraphics_0.5-1 shape_1.4.6.1
#> [37] vroom_1.6.5 car_3.1-3 HTSCluster_2.0.11
#> [40] GO.db_3.21.0 leaps_3.2 Matrix_1.7-3
#> [43] abind_1.4-8 R.methodsS3_1.8.2 lifecycle_1.0.4
#> [46] scatterplot3d_0.3-44 multcomp_1.4-28 yaml_2.3.10
#> [49] edgeR_4.5.10 carData_3.0-5 rhdf5_2.51.2
#> [52] qvalue_2.39.0 SparseArray_1.7.7 Rtsne_0.17
#> [55] grid_4.6.0 blob_1.2.4 promises_1.3.2
#> [58] shinydashboard_0.7.2 crayon_1.5.3 dir.expiry_1.15.0
#> [61] ggtangle_0.0.6 lattice_0.22-6 cowplot_1.1.3
#> [64] KEGGREST_1.47.0 HTSFilter_1.47.0 magick_2.8.6
#> [67] capushe_1.1.2 pillar_1.10.1 ComplexHeatmap_2.23.1
#> [70] fgsea_1.33.4 rjson_0.2.23 estimability_1.5.1
#> [73] corpcor_1.6.10 mixOmics_6.31.4 codetools_0.2-20
#> [76] fastmatch_1.1-6 glue_1.8.0 ggfun_0.1.8
#> [79] data.table_1.17.0 vctrs_0.6.5 png_0.1-8
#> [82] treeio_1.31.0 EnhancedVolcano_1.25.0 gtable_0.3.6
#> [85] cachem_1.1.0 xfun_0.51 S4Arrays_1.7.3
#> [88] mime_0.13 coda_0.19-4.1 survival_3.8-3
#> [91] pheatmap_1.0.12 iterators_1.0.14 tinytex_0.56
#> [94] statmod_1.5.0 TH.data_1.1-3 nlme_3.1-168
#> [97] ggtree_3.15.0 bit64_4.6.0-1 filelock_1.0.3
#> [100] UpSetR_1.4.0 tensorA_0.36.2.1 bslib_0.9.0
#> [103] colorspace_2.1-1 DBI_1.2.3 DESeq2_1.47.5
#> [106] tidyselect_1.2.1 emmeans_1.11.0 bit_4.6.0
#> [109] compiler_4.6.0 Rmixmod_2.1.10 compositions_2.0-8
#> [112] h5mread_0.99.4 basilisk.utils_1.19.1 DelayedArray_0.33.6
#> [115] bookdown_0.42 scales_1.3.0 DEoptimR_1.1-3-1
#> [118] multcompView_0.1-10 stringr_1.5.1 digest_0.6.37
#> [121] MOFA2_1.17.0 rmarkdown_2.29 basilisk_1.19.3
#> [124] XVector_0.47.2 pkgconfig_2.0.3 FactoMineR_2.11
#> [127] fastmap_1.2.0 rlang_1.1.5 GlobalOptions_0.1.2
#> [130] htmlwidgets_1.6.4 UCSC.utils_1.3.1 shiny_1.10.0
#> [133] farver_2.1.2 jquerylib_0.1.4 zoo_1.8-13
#> [136] jsonlite_2.0.0 BiocParallel_1.41.5 GOSemSim_2.33.0
#> [139] R.oo_1.27.0 magrittr_2.0.3 Formula_1.2-5
#> [142] GenomeInfoDbData_1.2.14 ggplotify_0.1.2 patchwork_1.3.0
#> [145] Rhdf5lib_1.29.2 munsell_0.5.1 Rcpp_1.0.14
#> [148] ape_5.8-1 reticulate_1.42.0 stringi_1.8.7
#> [151] MASS_7.3-65 plyr_1.8.9 parallel_4.6.0
#> [154] ggrepel_0.9.6 forcats_1.0.0 Biostrings_2.75.4
#> [157] splines_4.6.0 circlize_0.4.16 locfit_1.5-9.12
#> [160] igraph_2.1.4 ggpubr_0.6.0 ggsignif_0.6.4
#> [163] reshape2_1.4.4 evaluate_1.0.3 BiocManager_1.30.25
#> [166] tzdb_0.5.0 foreach_1.5.2 httpuv_1.6.15
#> [169] tidyr_1.3.1 purrr_1.0.4 clue_0.3-66
#> [172] BiocBaseUtils_1.9.0 broom_1.0.8 xtable_1.8-4
#> [175] e1071_1.7-16 RSpectra_0.16-2 tidytree_0.4.6
#> [178] rstatix_0.7.2 later_1.4.1 class_7.3-23
#> [181] rARPACK_0.11-0 tibble_3.2.1 clusterProfiler_4.15.1
#> [184] aplot_0.2.5 memoise_2.0.1 AnnotationDbi_1.69.0
#> [187] ellipse_0.5.0 cluster_2.1.8.1 corrplot_0.95
#> [190] shinyWidgets_0.9.0