---
title: "Designing SD context with ModCon"
author: "Johannes Ptok"
date: "`r format(Sys.Date(), '%m/%d/%Y')`"
output:
  rmarkdown::html_document:
vignette: >
  %\VignetteIndexEntry{Designing SD context with ModCon}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, echo=FALSE, results="hide"}
knitr::opts_chunk$set(tidy = FALSE,
                      cache = FALSE,
                      dev = "png")
```	


# Introduction

Splicing describes the process where intronic sequences are excised
from a precursor RNA transcript and the remaining exonic sequences
are spliced together. Almost every human RNA transcript is spliced, 
making its proper execution crucial for human health. Splicing is
accomplished by a cellular mashinery called the spliceosome. The U1
snRNP subunit of the spliceosome recognizes the upstream end of an 
intron, called splice donor (SD) or 5' splice site, whereas the 
downstream end of an intron, the splice acceptor (SA), or 3' splice
site is recognized via the U2 snRNP. The generall process is mediated
by hundreds of proteins which either are part of the spliceosome, or 
which bind and guide the spliceosome to the RNA transcript. These 
RNA-binding so-called splice regulatory proteins (SRPs) basically 
consists of two major protein families, the SR proteins and the 
hnRNP, which influence splice site activation position-dependently.
SR protein binding leads to activation of upstream SAs and downstream
SDs, but a silencing of upstream SDs and downstream SAs. hnRNPs behave
the other way around. The HEXplorer score uses this position-dependent 
effect of SRPs and calculates the property of sequences to activate
or silence splice sites upstream or downsteam, via SR protein and 
hnRNP binding approximation. Combining the HEXplorer profile integral
of the upstream and the downstream proximal sequence surrounding of 
a splice site, the splice site HEXplorer weight (SSHW) estimates
whether the sequence surrounding potentially either enhances or
silences splice site usage. The *ModCon* package, consists of
functions which for the first time enable adjustment of the SSHW
of a splice donor site via codon selection, while keeping the
underlying amino acid sequence encoding intact. One application
of ModCon could be during the design of splicing reporter or
expression vectors.


# Implementation

ModCon can either be used to increase or decrease the splice site 
HEXplorer weight of a splice donor by codon selection. The respective
SSHW adjustment can be accomplished to 0-100%. For 100% SSHW maximization
or minimization, a sliding window approach is used, whereas a genetic
algorithm is used for SSHW adjustments to less than 100%. For SSHW
maximization, ModCon increases the HEXplorer profile upstream and decreases
the HEXplorer profil downstream of the respective splice donor. For
SSHW minimization, ModCon decreases the HEXplorer profil upstream 
and increases the HEXplorer profil downstream of the splice donor.
HEXplorer score calculations during the genetic algorithm can be done 
parallel using the **parallel** package. ModCon additionally enables the 
degradation of the intrinsic strength of splice sites within the sequence 
surrounding which is altered for SSHW adjustment. For that purpose, the
HBond score or the MaxEntScan score is applied to calculate the intrinsic
strength of splice donor or splice acceptor sites. With kind approval of
the developers, MaxEntScan score calculations are done using the published
perl script. Therefor, **ModCon** relies on prior installation of any version
of the programming language **Perl** (e.g. Strawberry Perl). 


# Applying ModCon

The main function of the ModCon package is the `ModCon` function, which
requires a coding sequence (CDS) and the position of the first nucleotide
of the SD within the CDS. Per default, the SSHW of the respective SD is 
increased as much as possible.

```{r, eval=TRUE}
library(ModCon)

#CDS holding the respective donor site
cds <- paste0('ATGGAAGACGCCAAAAACATAAAGAAAGGCCCGGCGCCATTCTATCCGCTG',
              'GAAGATGGAACCGCTGGAGAGCAACTGCATAAGGCTATGAAGAGATACGCC',
              'CTGGTTCCTGGAACAATTGCTTTTACAGATGCACATATCGAGGTGGACATC',
              'ACTTACGCTGAGTACTTCGAA')

## Executing ModCon to increase the splice site HEXplorer weigth of 
## the splice donor at position 103
cdsSSHWincreased <- ModCon(cds, 103, nCores=1)
cdsSSHWincreased

```

The resulting character string holds the alternative nucleotide sequence
with an increased SSHW for the index splice donor site at position 103. 
The new CDS encodes the same amino acid sequence as before.


To achive the minimal SSHW, the ModCon function parameter `optimizeContext`
has to be set to `FALSE`.

```{r, eval=TRUE}
## Executing ModCon to decrease the splice site HEXplorer weigth of 
## the splice donor at position 103
cdsSSHWdecreased <- ModCon(cds, 103, optimizeContext=FALSE, nCores=1)
cdsSSHWdecreased
```

The resulting character string holds the alternative nucleotide sequence
with an decreased SSHW for the index splice donor site at position 103. 
Again, the new CDS encodes the same amino acid sequence as before.


The extent of SSHW minimization and maximization can alternatively be 
limited to e.g. 60% of the maximum or minimum setting the `optiRate` 
to 60. The progress is omitted per generation (not shown in this vignette).

```{r, eval=TRUE}
## Executing ModCon to increase the splice site HEXplorer weigth of 
## the splice donor at position 103 to around 60% of the maximum
suppressMessages(cdsSSHWincreased <- ModCon(cds, 103, optiRate=60, nCores=1))
suppressMessages(cdsSSHWdecreased <- ModCon(cds, 103, optiRate=60, optimizeContext=FALSE, nCores=1))
cdsSSHWincreased
cdsSSHWdecreased
```

The resulting character strings hold the alternative nucleotide sequences 
with either an increased or decreased SSHW for the index splice donor site
at position 103. With setting the parameter `optiRate` to 60, the SSHW 
increase and SSHW decrease was only performed to reach the around 60% of 
the highest or lowest SSHW possible.
Again, the new coding sequences encode the same amino acid sequence as 
the original CDS.



Changing the `optiRate` parameter of the `ModCon` function from the
default value 100 triggers usage of the genetic algorithm, instead of
the sliding window approach. Most parameters of the genetic algorithm
and how many cores should be used for the calculations can be additionally
adjusted with the respective `ModCon` function parameter.

```{r, eval=TRUE}
## Executing ModCon to increase the splice site HEXplorer weigth of 
## the splice donor at position 103 to around 60% of the maximum
suppressMessages(cdsSSHWincreased <- ModCon(cds, 103, 
                                sdMaximalHBS=10, acMaximalMaxent=4, optiRate=60,
                                nGenerations=5, parentSize=200, startParentSize=800,
                                bestRate=50, semiLuckyRate=10, luckyRate=5,
                                mutationRate=1e-03, nCores=1))

cdsSSHWincreased
```

As with the sliding window approach, the resulting character string holds
the alternative nucleotide sequence with an increased SSHW for the 
index splice donor site at position 103. The new CDS encodes the 
same amino acid sequence as before.


The size of the sequence surroundings can be set using the parameters
`upChangeCodonsIn` and `downChangeCodonsIn`, which define the number of
codons to be adjusted around the splice site for SSHW adjustment (default=16).

```{r, eval=TRUE}
## Executing ModCon to decrease the splice site HEXplorer weigth of 
## the splice donor at position 103
cdsSSHWdecreased <- ModCon(cds, 103, downChangeCodonsIn=20, upChangeCodonsIn=21, nCores=1)
cdsSSHWdecreased
```

As with the sliding window approach, the resulting character string holds
the alternative nucleotide sequence with an decreased SSHW for the 
index splice donor site at position 103. The new CDS encodes the 
same amino acid sequence as before.


The **ModCon** package additionally holds functions to increase or decrease
the intrinsic strength (Hbond score) of a secific splice donor site while 
keeping the underlying encoded amino acid sequences the same. 

```{r, eval=TRUE}
## Decreaser HBond score of the splice donor at position 103
## by codon selection
cdsHBondDown <- decreaseGTsiteStrength(cds, 103)
cdsHBondUp <- increaseGTsiteStrength(cds, 103)
cdsHBondDown
```

`cdsHBondDown` states a coding sequence, encoding the same amino acid as the input CDS. However, a splice donor sequence at the stated index position within the CDS will be aimed to be decrased in its intrinsic strength.
`cdsHBondUp` states a coding sequence, encoding the same amino acid as the input CDS. However, a splice donor sequence at the stated index position within the CDS will be aimed to be decrased in its intrinsic strength.


Integrated functions also include functions to decrease the intrinisc strength
of every splice donor or acceptor within a coding sequence, while considering 
whether the overall HEXplorer profile should be increased or decreased of the 
respective entered sequence.

```{r, eval=TRUE}
library(data.table)
## Initiaing the Codons matrix plus corresponding amino acids
ntSequence <- 'TTTTCGATCGGGATTAGCCTCCAGGTAAGTATCTATCGATCTATGCGATAG'
## Create Codon Matrix by splitting up the sequence by 3nt
fanFunc <- createCodonMatrix(ntSequence)
cdsSDlow <- degradeSDs(fanFunc, maxhbs=10, increaseHZEI=TRUE)
cdsSAlow <- degradeSAs(fanFunc, maxhbs=10, maxME=4, increaseHZEI=TRUE)
cdsSDlow
cdsSAlow
```

`cdsSDlow` states a coding sequence, encoding the same amino acid as the input CDS. However, every potential splice donor sequence within the CDS, which exceeds a Hbond score treshold (`maxhbs`) will be aimed to be degraded in its intrinsic strength as much as possible. If, additionally, potential splice acceptor sites should be degraded in their intrinsic strengt, the following function `degradeSAs` will reduce the number of potential relevant splice acceptor sites within the CDS.
`cdsSAlow` states a coding sequence, encoding the same amino acid as the input CDS. However, every potential splice donor sequence within the CDS, which exceeds a Hbond score treshold (`maxhbs`) and every potential splice acceptor site, which exceeds a certain MaxEntScan score treshold, will be aimed to be degraded in its intrinsic strength as much as possible.

A graphic user interface based on shiny can be opened with the function `startModConApp.Rd`.

# Session info

```{r sessionInfo}
sessionInfo()
```