Contents

1 Introduction

RFLOMICS is a R package that provides a complete workflow for the analysis of multi-omics data within complex experimental designs framework. The package was originally designed to be used through a user interface (see RFLOMICS vignette), but has been developed in an object oriented programming with a limited number of classes and methods. This allow to run data analysis as command line in a workflow with the “|>” operator, each step being a method of the extended MultiAssayExperiment or SummarizedExperiment classes.

2 The RflomicsMAE and RflomicsSE classes

To ensure efficient data management the MultiAssayExperiment and SummarizedExperiment classes were extended to fix an organization for the metadata slot. The statistical design and analysis parameters of the multi-omics data integration are stored in the metadata slot of the RflomicsMAE class, while the parameters and results of the single-omics are stored in the metadata slot of RflomicsSE objects of the ExperimentList.

This organization can be illustrated by presenting the workflow of the RFLOMICS package.

The workflow is structured in three main parts:

Fig 1: Organization of the metadata slot of the RflomicsMAE object

Fig 2: Organization of the metadata slot of the RflomicsSE objects

Fig 3: storage of multi-omics integration settings and results in the metadata slot of the RflomicsMAE object

As you will see, this analysis workflow is implemented as methods of the two classes (RflomicsMAE and RflomicsSE), providing a generic interface whatever the type of omics data being analyzed (RNAseq, proteomics, metabolomics, etc.). These methods return the same class as the input object (RflomicsMAE and RflomicsSE) with the metadata slot enriched with parameters and results. Getter and Setter methods have been defined to access the metadata objects.

3 Methods for RflomicsMAE and RflomicsSE classes

3.1 Instantiating a RflomicsMAE object from a MAE object

In this example we will instantiate a RflomicsMAE object from an existing MAE object and a design data.frame. We will use the ecoseed dataset from the RFLOMICS package (see data(ecoseed)).

ecoseed is a multi-omics dataset composed of three data matrices: transcriptomics (raw RNAseq read count data matrix), metabolomics and proteomics (relative abundance matrix as XIC) obtained from a study examining the effect of seed production temperature on the germination potential of Arabidopsis thaliana.

The experimental design includes three factors:

- Repeat: batch factor: rep1, rep2, rep3

- temperature: biological factor with three conditions: Low, Medium, Elevated

- imbibition: biological factor with three conditions: DS (Dry Deed), EI (Early Imbibition), LI (Late Imbibition)

Crucial information is needed to complete the design and create the RflomicsMAE object. Experimental factors must be defined with their type (batch or biological) and a reference level. These information are used to automatically generate the statistical design.

PCA on raw data is performed during the instantiation of the RflomicsMAE object.

data(ecoseed.mae)

factorInfo <- data.frame(
  "factorName"   = c("Repeat", "temperature", "imbibition"),
  "factorType"   = c("batch", "Bio", "Bio"),
  "factorRef"    = c("rep1", "Low", "DS")
)

# create rflomicsMAE object with ecoseed data
MAE <- createRflomicsMAE(
  projectName = "Tests",
  omicsData   =  ecoseed.mae,
  omicsTypes  = c("RNAseq", "proteomics", "metabolomics"),
  factorInfo  = factorInfo
)

names(metadata(MAE))
#> [1] "omicList"            "projectName"         "design"             
#> [4] "IntegrationAnalysis" "date"                "sessionInfo"        
#> [7] "rflomicsVersion"

head(getDesignMat(MAE))
#>             Repeat    groups temperature imbibition     samples
#> Low_DS_1      rep1    Low_DS         Low         DS    Low_DS_1
#> Low_DS_2      rep2    Low_DS         Low         DS    Low_DS_2
#> Low_DS_3      rep3    Low_DS         Low         DS    Low_DS_3
#> Medium_DS_1   rep1 Medium_DS      Medium         DS Medium_DS_1
#> Medium_DS_2   rep2 Medium_DS      Medium         DS Medium_DS_2
#> Medium_DS_3   rep3 Medium_DS      Medium         DS Medium_DS_3

3.2 Overview of the main methods

Omics data is heterogeneous and each omics requires specific methods and parameters to take into account of the nature of data. RFLOMICS has a set of generic methods than can be applied to the RflomicsMAE or RflomicsSE objects and which adapt the methods and parameters to the type of omics data: RNAseq, proteomics, metabolomics.

The main methods are:

Methods Description
createRflomicsMAE instantiate a RflomicsMAE object from a MAE object and a design data.frame
runDataProcessing Process data: filtering: features or samples, normalization and/or transformation, PCA
runDiffAnalysis perform a Differential features analysis on omics data
runCoExpression perform a CoExpression analysis from the union/intersection of differentially expressed features list.
runAnnotationEnrichment perform an annotation enrichment analysis (GO, KEGG, custom) on the list of differentially or co-expressed features
runOmicsIntegration perform an integration analysis (MOFA or DIABLO)
generateReport generate a report from the RflomicsMAE object and optionally export the report and results

Table 1: Main methods of the RflomicsMAE and RflomicsSE classes for running a multi-omics analysis workflow.

When applied to a RflomicsMAE object, the name of the RflomicsSE object must be specified with the SE.name parameter.

3.3 Executing the workflow :

3.3.1 Define the statistical settings: model and contrasts

The next step is to define the model and associated contrasts by translating the biological question into statistical terms. The generateModelFormulae function is used to find all possible models given the experimental design settings.

Rflomics accepts up to three biological factors and one batch effect. Only second-order interaction terms between biological factors will appear in the model formulae. Batch factors will never appear in the interaction terms. For the rest of the example, we choose the second formula. The model without the interaction term.

  • set the model formulae
form <- generateModelFormulae(MAE)
form
#> $`~Repeat + temperature + imbibition + temperature:imbibition`
#> ~Repeat + temperature + imbibition + temperature:imbibition
#> <environment: 0x5a14db2f23d0>
#> 
#> $`~Repeat + temperature + imbibition`
#> ~Repeat + temperature + imbibition
#> <environment: 0x5a14db2f23d0>


MAE <- setModelFormula(MAE, form[2])


getModelFormula(MAE)
#> [1] "~Repeat + temperature + imbibition"

Given the formula chosen, all the hypotheses (contrasts) can be obtained using the generateExpressionContrast function. Three types of contrast are expressed: pairwise comparison, average expression and interaction expression.

In this example, we will select the first three contrasts from the averaged contrasts.

  • set contrasts:
possibleContrasts <- generateExpressionContrast(MAE)
possibleContrasts$averaged$contrastName
#> [1] "(temperatureMedium - temperatureLow) in mean"     
#> [2] "(temperatureElevated - temperatureLow) in mean"   
#> [3] "(temperatureElevated - temperatureMedium) in mean"
#> [4] "(imbibitionEI - imbibitionDS) in mean"            
#> [5] "(imbibitionLI - imbibitionDS) in mean"            
#> [6] "(imbibitionLI - imbibitionEI) in mean"

MAE <- setSelectedContrasts(MAE, contrastList = possibleContrasts$averaged[1:3])

getSelectedContrasts(MAE)[, -c(1, 3)]
#>                                         contrastName   type    tag
#>                                               <char> <char> <char>
#> 1:      (temperatureMedium - temperatureLow) in mean   mean     H1
#> 2:    (temperatureElevated - temperatureLow) in mean   mean     H2
#> 3: (temperatureElevated - temperatureMedium) in mean   mean     H3

This statistical framework will be applied on all loaded datasets.

3.3.2 Single-omics analysis

3.3.2.1 Data processing and quality control

Data processing, including normalization and transformation parameters and results, can be carried out in a single step using the runDataProcessing method. Or in several steps using the methods called by runDataProcessing, which are:filterLowAbundance (RNAseq), runNormalization and runTransformData.

In this example, the three omics data sets are processed in a single step.

MAE <- runDataProcessing(
  object = MAE,
  SE.name = "RNAtest",
  samples = colnames(MAE[["RNAtest"]])[-1],
  filterStrategy = "NbReplicates",
  cpmCutoff = 1,
  normMethod = "TMM"
) |>
  runDataProcessing(
    SE.name = "metatest", normMethod = "median",
    transformMethod = "log2"
  ) |>
  runDataProcessing(
    SE.name = "protetest", normMethod = "none",
    transformMethod = "log2"
  ) |>
  runOmicsPCA(SE.name = "metatest")

# Access to the normalization settings for metabolomics data

getNormSettings(object = MAE[["metatest"]])
#> $method
#> [1] "median"
#> 
#> $suppInfo
#> NULL

# Obtain the list of filtered features for the RNAseq data

getFilteredFeatures(object = MAE[["RNAtest"]])[1:10]
#>  [1] "AT1G07680" "AT1G11920" "AT1G23210" "AT1G31670" "AT1G33540" "AT1G34510"
#>  [7] "AT1G44050" "AT1G47600" "AT1G50060" "AT1G52050"

It is important to note that none of the data is directly transformed in the assay, unless the user specifically requests it by setting the modifyAssay argument to TRUE. This way, you can always change your mind and test several normalisation or transformation processes.

You can access to processing results and plot them.

plotOmicsPCA(MAE[["metatest"]], raw = TRUE)
plotOmicsPCA(MAE[["metatest"]], raw = FALSE)

plotDataDistribution(MAE[["metatest"]], raw = TRUE)
plotDataDistribution(MAE[["metatest"]], raw = FALSE)

3.3.2.2 Differential analysis: settings and results

Differential analysis on each contrasts can be performed using the runDiffAnalysis method. This method is specialized for each omics type and uses the limma package for proteomics and metabolomics data and the edgeR package for RNAseq data. The method returns the same object with the metadata slot enriched with the parameters and results of the analysis.

MAE <- runDiffAnalysis(MAE,
  SE.name = "RNAtest", p.adj.method = "BH",
  method = "edgeRglmfit", p.adj.cutoff = 0.05, logFC.cutoff = 0
) |>
  runDiffAnalysis(
    SE.name = "protetest", p.adj.method = "BH",
    method = "limmalmFit", p.adj.cutoff = 0.05, logFC.cutoff = 0
  ) |>
  runDiffAnalysis(
    SE.name = "metatest", p.adj.method = "BH",
    method = "limmalmFit", p.adj.cutoff = 0.05, logFC.cutoff = 0
  )

# access to the settings

getDiffSettings(MAE, SE.name = "RNAtest")
#> $method
#> [1] "edgeRglmfit"
#> 
#> $p.adj.method
#> [1] "BH"
#> 
#> $p.adj.cutoff
#> [1] 0.05
#> 
#> $abs.logFC.cutoff
#> [1] 0
#> 
#> $contrastCoef
#>                                                   Intercept Repeatrep2
#> (temperatureMedium - temperatureLow) in mean              0          0
#> (temperatureElevated - temperatureLow) in mean            0          0
#> (temperatureElevated - temperatureMedium) in mean         0          0
#>                                                   Repeatrep3 temperatureMedium
#> (temperatureMedium - temperatureLow) in mean               0                 1
#> (temperatureElevated - temperatureLow) in mean             0                 0
#> (temperatureElevated - temperatureMedium) in mean          0                -1
#>                                                   temperatureElevated
#> (temperatureMedium - temperatureLow) in mean                        0
#> (temperatureElevated - temperatureLow) in mean                      1
#> (temperatureElevated - temperatureMedium) in mean                   1
#>                                                   imbibitionEI imbibitionLI
#> (temperatureMedium - temperatureLow) in mean                 0            0
#> (temperatureElevated - temperatureLow) in mean               0            0
#> (temperatureElevated - temperatureMedium) in mean            0            0

# Summary of the differential analysis

getDiffStat(MAE[["RNAtest"]])
#>                                                    All   Up Down
#> (temperatureMedium - temperatureLow) in mean      4364 2099 2265
#> (temperatureElevated - temperatureLow) in mean    9247 4637 4610
#> (temperatureElevated - temperatureMedium) in mean 3655 1887 1768
plotDiffAnalysis(MAE,
  SE.name = "RNAtest",
  contrastName = "(temperatureMedium - temperatureLow) in mean",
  typeofplots = "volcano"
)
#> $Volcano.plot

plotBoxplotDE(MAE[["RNAtest"]],
  feature = "AT4G04810"
)

3.3.2.3 Co-expression analysis: settings and results

Co-expression analysis on the list of differentially expressed features can be performed using the runCoExpression method. This method uses methods from the coseq package with a set of optimal parameters tailored to each omics. In this example, we will perform a co-expression analysis on the proteomics data.

The results can be plot using the getCoExpAnalysesSummary method.

A co-expression profile can be plotted for a given cluster using the plotCoExpressionProfile function.

MAE <- runCoExpression(MAE,
  SE.name = "protetest",
  K = 2:5, replicates = 5,
  merge = "union"
)
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5 
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5 
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5 
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5 
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5 
#> Use seed argument in coseq for reproducible results.
#> ****************************************

getCoExpAnalysesSummary(MAE)


plotCoExpression(MAE[["protetest"]])$ICL


plotCoExpressionProfile(MAE[["protetest"]], cluster = 2)

3.3.2.4 Annotation enrichment analysis: settings and results

Annotation enrichment analysis can be performed using the runAnnotationEnrichment method. This method uses the clusterProfiler package. In this example, the org.At.tair.db database for Arabidopsis thaliana will be used to perform the GO enrichment analysis of the DE protein list.

The results can be viewed using the getEnrichRes method, a summary of the number of enriched terms by the sumORA function and plotted, by example, using the plotEnrichComp method.

MAE <- runAnnotationEnrichment(MAE,
  SE.name = "protetest",
  OrgDb = "org.At.tair.db",
  keyType = "TAIR",
  pvalueCutoff = 0.05,
  from = "DiffExp",
  database = "GO",
  domain = "CC"
)

results <- getEnrichRes(MAE[["protetest"]],
  from = "DiffExp", database = "GO"
)

sumORA(MAE[["protetest"]], from = "DiffExp", database = "GO")
#>                                                                                            Contrast
#> (temperatureMedium - temperatureLow) in mean           (temperatureMedium - temperatureLow) in mean
#> (temperatureElevated - temperatureLow) in mean       (temperatureElevated - temperatureLow) in mean
#> (temperatureElevated - temperatureMedium) in mean (temperatureElevated - temperatureMedium) in mean
#>                                                   CC
#> (temperatureMedium - temperatureLow) in mean       0
#> (temperatureElevated - temperatureLow) in mean     1
#> (temperatureElevated - temperatureMedium) in mean  1

plotEnrichComp(MAE[["protetest"]],
  from = "DiffExp", database = "GO",
  matrixType = "FC"
)

3.3.3 Multi-omics integration analysis

The multi-omics step in RFLOMICS takes two steps: the preparation of the object, based on the user-selected method, and the actual run of the selected method. This is done using prepareForIntegration and runOmicsIntegration. In this example, we will use only the proteomics and metabolomics dataset with the mixOmics package (block.plsda function).

prepareForIntegration does not return a RflomicsMAE object, it returns the corresponding type to be an input for either MOFA2 or mixOmics functions.

listIntegration <- prepareForIntegration(MAE,
  omicsNames = c("protetest", "metatest"),
  variableLists = rownames(MAE),
  method = "mixOmics"
)

MAE <- runOmicsIntegration(
  object = MAE,
  preparedObject = listIntegration,
  method = "mixOmics",
  selectedResponse = "temperature",
  ncomp = 3
)

Visualizing results is made through the original package functions.

mixOmics::plotIndiv(getMixOmics(MAE, response = "temperature"))

4 Generating a report from the RflomicsMAE object

The generateReport function is used to generate a report from a RflomicsMAE object. The report is generated in HTML format.

The MAE object can be saved (export = TRUE), exporting the object in a .RData, the report in html format, the graphs in png format and the data in txt (tabulated) format.

It’s use is straightforward, using the generateReport(object = MAE, fileName = "ecoseed_report.html", export = FALSE) command. It can take a long time, depending on the amount of analyses performed and the data size.

5 Session information

sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] RFLOMICS_0.99.7             coseq_1.31.0               
#>  [3] knitr_1.50                  htmltools_0.5.8.1          
#>  [5] ggplot2_3.5.1               dplyr_1.1.4                
#>  [7] shinyBS_0.61.1              MultiAssayExperiment_1.33.9
#>  [9] SummarizedExperiment_1.37.0 Biobase_2.67.0             
#> [11] GenomicRanges_1.59.1        GenomeInfoDb_1.43.4        
#> [13] IRanges_2.41.3              S4Vectors_0.45.4           
#> [15] BiocGenerics_0.53.6         generics_0.1.3             
#> [17] MatrixGenerics_1.19.1       matrixStats_1.5.0          
#> [19] BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] fs_1.6.5                enrichplot_1.27.4       httr_1.4.7             
#>   [4] RColorBrewer_1.1-3      doParallel_1.0.17       tools_4.6.0            
#>   [7] backports_1.5.0         R6_2.6.1                DT_0.33                
#>  [10] HDF5Array_1.35.16       lazyeval_0.2.2          uwot_0.2.3             
#>  [13] rhdf5filters_1.19.2     GetoptLong_1.0.5        withr_3.0.2            
#>  [16] gridExtra_2.3           bayesm_3.1-6            cli_3.6.4              
#>  [19] Cairo_1.6-2             flashClust_1.01-2       sandwich_3.1-1         
#>  [22] labeling_0.4.3          sass_0.4.9              mvtnorm_1.3-3          
#>  [25] robustbase_0.99-4-1     proxy_0.4-27            yulab.utils_0.2.0      
#>  [28] gson_0.1.0              DOSE_4.1.0              R.utils_2.13.0         
#>  [31] plotrix_3.8-4           limma_3.63.11           org.At.tair.db_3.21.0  
#>  [34] RSQLite_2.3.9           gridGraphics_0.5-1      shape_1.4.6.1          
#>  [37] vroom_1.6.5             car_3.1-3               HTSCluster_2.0.11      
#>  [40] GO.db_3.21.0            leaps_3.2               Matrix_1.7-3           
#>  [43] abind_1.4-8             R.methodsS3_1.8.2       lifecycle_1.0.4        
#>  [46] scatterplot3d_0.3-44    multcomp_1.4-28         yaml_2.3.10            
#>  [49] edgeR_4.5.10            carData_3.0-5           rhdf5_2.51.2           
#>  [52] qvalue_2.39.0           SparseArray_1.7.7       Rtsne_0.17             
#>  [55] grid_4.6.0              blob_1.2.4              promises_1.3.2         
#>  [58] shinydashboard_0.7.2    crayon_1.5.3            dir.expiry_1.15.0      
#>  [61] ggtangle_0.0.6          lattice_0.22-6          cowplot_1.1.3          
#>  [64] KEGGREST_1.47.0         HTSFilter_1.47.0        magick_2.8.6           
#>  [67] capushe_1.1.2           pillar_1.10.1           ComplexHeatmap_2.23.1  
#>  [70] fgsea_1.33.4            rjson_0.2.23            estimability_1.5.1     
#>  [73] corpcor_1.6.10          mixOmics_6.31.4         codetools_0.2-20       
#>  [76] fastmatch_1.1-6         glue_1.8.0              ggfun_0.1.8            
#>  [79] data.table_1.17.0       vctrs_0.6.5             png_0.1-8              
#>  [82] treeio_1.31.0           EnhancedVolcano_1.25.0  gtable_0.3.6           
#>  [85] cachem_1.1.0            xfun_0.51               S4Arrays_1.7.3         
#>  [88] mime_0.13               coda_0.19-4.1           survival_3.8-3         
#>  [91] pheatmap_1.0.12         iterators_1.0.14        tinytex_0.56           
#>  [94] statmod_1.5.0           TH.data_1.1-3           nlme_3.1-168           
#>  [97] ggtree_3.15.0           bit64_4.6.0-1           filelock_1.0.3         
#> [100] UpSetR_1.4.0            tensorA_0.36.2.1        bslib_0.9.0            
#> [103] colorspace_2.1-1        DBI_1.2.3               DESeq2_1.47.5          
#> [106] tidyselect_1.2.1        emmeans_1.11.0          bit_4.6.0              
#> [109] compiler_4.6.0          Rmixmod_2.1.10          compositions_2.0-8     
#> [112] h5mread_0.99.4          basilisk.utils_1.19.1   DelayedArray_0.33.6    
#> [115] bookdown_0.42           scales_1.3.0            DEoptimR_1.1-3-1       
#> [118] multcompView_0.1-10     stringr_1.5.1           digest_0.6.37          
#> [121] MOFA2_1.17.0            rmarkdown_2.29          basilisk_1.19.3        
#> [124] XVector_0.47.2          pkgconfig_2.0.3         FactoMineR_2.11        
#> [127] fastmap_1.2.0           rlang_1.1.5             GlobalOptions_0.1.2    
#> [130] htmlwidgets_1.6.4       UCSC.utils_1.3.1        shiny_1.10.0           
#> [133] farver_2.1.2            jquerylib_0.1.4         zoo_1.8-13             
#> [136] jsonlite_2.0.0          BiocParallel_1.41.5     GOSemSim_2.33.0        
#> [139] R.oo_1.27.0             magrittr_2.0.3          Formula_1.2-5          
#> [142] GenomeInfoDbData_1.2.14 ggplotify_0.1.2         patchwork_1.3.0        
#> [145] Rhdf5lib_1.29.2         munsell_0.5.1           Rcpp_1.0.14            
#> [148] ape_5.8-1               reticulate_1.42.0       stringi_1.8.7          
#> [151] MASS_7.3-65             plyr_1.8.9              parallel_4.6.0         
#> [154] ggrepel_0.9.6           forcats_1.0.0           Biostrings_2.75.4      
#> [157] splines_4.6.0           circlize_0.4.16         locfit_1.5-9.12        
#> [160] igraph_2.1.4            ggpubr_0.6.0            ggsignif_0.6.4         
#> [163] reshape2_1.4.4          evaluate_1.0.3          BiocManager_1.30.25    
#> [166] tzdb_0.5.0              foreach_1.5.2           httpuv_1.6.15          
#> [169] tidyr_1.3.1             purrr_1.0.4             clue_0.3-66            
#> [172] BiocBaseUtils_1.9.0     broom_1.0.8             xtable_1.8-4           
#> [175] e1071_1.7-16            RSpectra_0.16-2         tidytree_0.4.6         
#> [178] rstatix_0.7.2           later_1.4.1             class_7.3-23           
#> [181] rARPACK_0.11-0          tibble_3.2.1            clusterProfiler_4.15.1 
#> [184] aplot_0.2.5             memoise_2.0.1           AnnotationDbi_1.69.0   
#> [187] ellipse_0.5.0           cluster_2.1.8.1         corrplot_0.95          
#> [190] shinyWidgets_0.9.0