--- title: "Analyzing tRNA sequences and structures" author: "Felix G.M. Ernst" date: "`r Sys.Date()`" package: tRNA abstract: > Example of importing tRNAdb output as GRanges output: BiocStyle::html_document: toc: true toc_float: true df_print: paged vignette: > %\VignetteIndexEntry{tRNA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown(css.files = c('custom.css')) ``` # Introduction The `tRNA` package allows tRNA sequences and structures to be accessed and used for subsetting. In addition, it provides visualization tools to compare feature parameters of multiple tRNA sets and correlate them to additional data. The package expects a `GRanges` object with certain columns as input. The following columns are a requirement: `tRNA_length`, `tRNA_type`, `tRNA_anticodon`, `tRNA_seq`, `tRNA_str`, `tRNA_CCA.end`. # Loading tRNA information To work with the tRNA packages, tRNA information can be retrieved in a number of ways: 1. A `GRanges` object can be constructed by hand containing the required colums (see above) 2. a tRNAscan result file can be loaded using `import.tRNAscanAsGRanges()` from `tRNAscanImport` package 3. selected tRNA information can be retrieved using `import.tRNAdb()` from `tRNAdbImport` package For this vignette a predefined `GRanges` object is loaded. ```{r, echo=FALSE} suppressPackageStartupMessages( library(tRNA) ) data("gr", package = "tRNA", envir = environment()) ``` ```{r, eval=FALSE} library(tRNA) data("gr", package = "tRNA", envir = environment()) ``` # tRNA sequences and structures To retrieve the sequences for individual tRNA structure elements `gettRNAstructureGRanges` or `gettRNAstructureSeqs` can be used. ```{r} # just get the coordinates of the anticodonloop gettRNAstructureGRanges(gr, structure = "anticodonLoop") gettRNAstructureSeqs(gr, joinFeatures = TRUE, structure = "anticodonLoop") ``` In addition the sequences can be returned already joined to get a fully blank padded set of sequences. The boundaries of the individual structures is returned as metadata of the `RNAStringSet` object. ```{r} seqs <- gettRNAstructureSeqs(gr[1:10], joinCompletely = TRUE) seqs # getting the tRNA structure boundaries metadata(seqs)[["tRNA_structures"]] ``` Be aware, that `gettRNAstructureGRanges` and `gettRNAstructureSeqs` might not be working as expected, if the tRNA sequences in questions are armless or deviate a lot from the canonical tRNA model. It was thouroughly tested using human mitochondrial tRNA and other tRNAs missing certain features. However, for fringe cases this might not be the case. Please report these cases, if you find them, with an example. # Subsetting tRNA sequences The structure information can be queried for subsetting using the function `hasAccpeptorStem()` as an example. Please have a look at`?hasAccpeptorStem` for all available subsetting functions. ```{r echo=TRUE, results="hide"} gr[hasAcceptorStem(gr, unpaired = TRUE)] # mismatches and bulged are subsets of unpaired gr[hasAcceptorStem(gr, mismatches = TRUE)] gr[hasAcceptorStem(gr, bulged = TRUE)] # combination of different structure parameters gr[hasAcceptorStem(gr, mismatches = TRUE) & hasDloop(gr, length = 8)] ``` # Visualization To get an overview of tRNA feature and compare different datasets, `gettRNAFeaturePlots` can be used. It accepts a named `GRangesList` as input. Internally it will calculate a list of features values based on the functions mentioned above and the data contained in the mcols of the `GRanges` objects. The results can be retrieved without generating plots using `gettRNASummary`. ```{r} # load tRNA data for E. coli and H. sapiens data("gr_eco", package = "tRNA", envir = environment()) data("gr_human", package = "tRNA", envir = environment()) # get summary plots grl <- GRangesList(Sce = gr, Hsa = gr_human, Eco = gr_eco) plots <- gettRNAFeaturePlots(grl) ``` ```{r plot1, fig.cap = "tRNA length."} plots$length ``` ```{r plot2, fig.cap = "tRNAscan-SE scores."} plots$tRNAscan_score ``` ```{r plot3, fig.cap = "tRNA GC content."} plots$gc ``` ```{r plot4, fig.cap = "tRNAs with introns."} plots$tRNAscan_intron ``` ```{r plot5, fig.cap = "Length of the variable loop."} plots$variableLoop_length ``` To check whether features correlate with additional scores, optional arguments can be added to `gettRNAFeaturePlots` or used from the `score` column of the `GRanges` objects. For the first option provided a list of scores with the same dimensions as the `GRangesList` object as argument `scores` and for the latter set `plotScore = TRUE`. ```{r, eval=FALSE} # score column will be used plots <- gettRNAFeaturePlots(grl, plotScores = TRUE) ``` ```{r} plots <- gettRNAFeaturePlots(grl, scores = list(runif(length(grl[[1]]),0,100), runif(length(grl[[2]]),0,100), runif(length(grl[[3]]),0,100))) ``` ```{r plot6, fig.cap = "tRNA length and score correlation."} plots$length ``` ```{r plot7, fig.cap = "variable loop length and score correlation."} plots$variableLoop_length ``` The plots can be of course modified manually and changed to suit your needs. ```{r plot8, fig.cap = "Customized plot switching out the point and violin plot into a boxplot."} plots$length$layers <- plots$length$layers[c(-1,-2)] plots$length + ggplot2::geom_boxplot() ``` # Options The colours of the plots can be customized directly on creation with the following options. ```{r} options("tRNA_colour_palette") options("tRNA_colour_yes") options("tRNA_colour_no") ``` # Dot bracket annotation To retrieve detailed information on the base pairing `gettRNABasePairing()` or `getBasePairing()` can be used depending on the input type. The DotBracket annotation is expected with pairs of `<>{}[]()` (Please note the orientation. For `<>` the orientation is variable, since the tRNAscan files use the `><` annotation by default). ```{r} head(gettRNABasePairing(gr)[[1]]) head(getBasePairing(gr[1]$tRNA_str)[[1]]) ``` The loop ids for the structure elements can be retrieved with `gettRNALoopIDs()` or `getLoopIDs()`. Please not that pseudoloops are currently not correctly reflected in the output of these functions. ```{r} gettRNALoopIDs(gr)[[1]] getLoopIDs(gr[1]$tRNA_str) ``` # Session info ```{r} sessionInfo() ```