<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>GEN242 – Tutorials</title>
    <link>/tutorials/</link>
    <description>Recent content in Tutorials on GEN242</description>
    <generator>Hugo -- gohugo.io</generator>
    
	  <atom:link href="/tutorials/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>Tutorials: RNA-Seq Workflow Template</title>
      <link>/tutorials/systempiper/rnaseq/systempipernaseq/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>/tutorials/systempiper/rnaseq/systempipernaseq/</guid>
      <description>
        
        
        &lt;!--
# Compile from command-line
Rscript -e &#34;rmarkdown::render(&#39;systemPipeRNAseq.Rmd&#39;, c(&#39;BiocStyle::html_document&#39;), clean=FALSE); knitr::knit(&#39;systemPipeRNAseq.Rmd&#39;, tangle=TRUE)&#34;; Rscript -e &#34;rmarkdown::render(&#39;systemPipeRNAseq.Rmd&#39;, c(&#39;BiocStyle::pdf_document&#39;))&#34;
--&gt;
&lt;style type=&#34;text/css&#34;&gt;
pre code {
white-space: pre !important;
overflow-x: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
}
&lt;/style&gt;
&lt;script type=&#34;text/javascript&#34;&gt;
document.addEventListener(&#34;DOMContentLoaded&#34;, function() {
  document.querySelector(&#34;h1&#34;).className = &#34;title&#34;;
});
&lt;/script&gt;
&lt;script type=&#34;text/javascript&#34;&gt;
document.addEventListener(&#34;DOMContentLoaded&#34;, function() {
  var links = document.links;  
  for (var i = 0, linksLength = links.length; i &lt; linksLength; i++)
    if (links[i].hostname != window.location.hostname)
      links[i].target = &#39;_blank&#39;;
});
&lt;/script&gt;
&lt;h1 id=&#34;introduction&#34;&gt;Introduction&lt;/h1&gt;
&lt;h2 id=&#34;overview&#34;&gt;Overview&lt;/h2&gt;
&lt;p&gt;This workflow template is for analyzing RNA-Seq data. It is provided by
&lt;a href=&#34;https://bioconductor.org/packages/devel/data/experiment/html/systemPipeRdata.html&#34;&gt;systemPipeRdata&lt;/a&gt;,
a companion package to &lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/html/systemPipeR.html&#34;&gt;systemPipeR&lt;/a&gt; (H Backman and Girke 2016).
Similar to other &lt;code&gt;systemPipeR&lt;/code&gt; workflow templates, a single command generates
the necessary working environment. This includes the expected directory
structure for executing &lt;code&gt;systemPipeR&lt;/code&gt; workflows and parameter files for running
command-line (CL) software utilized in specific analysis steps. For learning
and testing purposes, a small sample (toy) data set is also included (mainly
FASTQ and reference genome files). This enables users to seamlessly run the
numerous analysis steps of this workflow from start to finish without the
requirement of providing custom data. After testing the workflow, users have
the flexibility to employ the template as is with their own data or modify it
to suit their specific needs. For more comprehensive information on designing
and executing workflows, users want to refer to the main vignettes of
&lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html&#34;&gt;systemPipeR&lt;/a&gt;
and
&lt;a href=&#34;https://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRdata.html&#34;&gt;systemPipeRdata&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;Rmd&lt;/code&gt; file (&lt;code&gt;systemPipeRNAseq.Rmd&lt;/code&gt;) associated with this vignette serves a dual purpose. It acts
both as a template for executing the workflow and as a template for generating
a reproducible scientific analysis report. Thus, users want to customize the text
(and/or code) of this vignette to describe their experimental design and
analysis results. This typically involves deleting the instructions how to work
with this workflow, and customizing the text describing experimental designs,
other metadata and analysis results.&lt;/p&gt;
&lt;h2 id=&#34;experimental-design&#34;&gt;Experimental design&lt;/h2&gt;
&lt;p&gt;Typically, the user wants to describe here the sources and versions of the
reference genome sequence along with the corresponding annotations. The standard
directory structure of &lt;code&gt;systemPipeR&lt;/code&gt; (see &lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#3_Directory_structure&#34;&gt;here&lt;/a&gt;),
expects the input data in a subdirectory named &lt;code&gt;data&lt;/code&gt;
and all results will be written to a separate &lt;code&gt;results&lt;/code&gt; directory. The Rmd source file
for executing the workflow and rendering its report (here &lt;code&gt;systemPipeRNAseq.Rmd&lt;/code&gt;) is
expected to be located in the parent directory.&lt;/p&gt;
&lt;p&gt;The test (toy) data set used by this template (&lt;a href=&#34;http://www.ncbi.nlm.nih.gov/sra/?term=SRP010938&#34;&gt;SRP010938&lt;/a&gt;)
contains 18 paired-end (PE) read sets from &lt;em&gt;Arabidposis thaliana&lt;/em&gt;
(Howard et al. 2013). To minimize processing time during testing, each FASTQ
file has been reduced to 90,000-100,000 randomly sampled PE reads that
map to the first 100,000 nucleotides of each chromosome of the &lt;em&gt;A.
thaliana&lt;/em&gt; genome. The corresponding reference genome sequence (FASTA) and
its GFF annotation files have been reduced to the same genome regions. This way the entire
test sample data set is less than 200MB in storage space. A PE read set has been
chosen here for flexibility, because it can be used for testing both types
of analysis routines requiring either SE (single end) reads or PE reads.&lt;/p&gt;
&lt;p&gt;To use their own RNA-Seq and reference genome data, users want to move or link the
data to the designated &lt;code&gt;data&lt;/code&gt; directory and execute the workflow from the parent directory
using their customized &lt;code&gt;Rmd&lt;/code&gt; file. Beginning with this template, users should delete the provided test
data and move or link their custom data to the designated locations.
Alternatively, users can create an environment skeleton (named &lt;code&gt;new&lt;/code&gt; &lt;a href=&#34;https://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/new.html&#34;&gt;here&lt;/a&gt;) or
build one from scratch. To perform an RNA-Seq analysis with new FASTQ files
from the same reference genome, users only need to provide the FASTQ files and
an experimental design file called ‘targets’ file that outlines the experimental
design. The structure and utility of targets files is described in &lt;code&gt;systemPipeR&#39;s&lt;/code&gt;
main vignette &lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#4_The_targets_file&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;workflow-steps&#34;&gt;Workflow steps&lt;/h2&gt;
&lt;p&gt;The default analysis steps included in this RNA-Seq workflow template are listed below. Users
can modify the existing steps, add new ones or remove steps as needed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Default analysis steps&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Read preprocessing
&lt;ul&gt;
&lt;li&gt;Quality filtering (trimming)&lt;/li&gt;
&lt;li&gt;FASTQ quality report&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Alignments: &lt;em&gt;&lt;code&gt;HISAT2&lt;/code&gt;&lt;/em&gt; (or any other RNA-Seq aligner)&lt;/li&gt;
&lt;li&gt;Alignment stats&lt;/li&gt;
&lt;li&gt;Read counting&lt;/li&gt;
&lt;li&gt;Sample-wise correlation analysis&lt;/li&gt;
&lt;li&gt;Analysis of differentially expressed genes (DEGs)&lt;/li&gt;
&lt;li&gt;GO term enrichment analysis&lt;/li&gt;
&lt;li&gt;Gene-wise clustering&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;load-workflow-environment&#34;&gt;Load workflow environment&lt;/h2&gt;
&lt;p&gt;The environment for this RNA-Seq workflow is auto-generated below with the
&lt;code&gt;genWorkenvir&lt;/code&gt; function (selected under &lt;code&gt;workflow=&amp;quot;rnaseq&amp;quot;&lt;/code&gt;). It is fully populated
with a small test data set, including FASTQ files, reference genome and annotation data. The name of the
resulting workflow directory can be specified under the &lt;code&gt;mydirname&lt;/code&gt; argument.
The default &lt;code&gt;NULL&lt;/code&gt; uses the name of the chosen workflow. An error is issued if
a directory of the same name and path exists already. After this, the user’s R
session needs to be directed into the resulting &lt;code&gt;rnaseq&lt;/code&gt; directory (here with
&lt;code&gt;setwd&lt;/code&gt;).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(systemPipeRdata)
genWorkenvir(workflow = &amp;quot;rnaseq&amp;quot;, mydirname = &amp;quot;rnaseq&amp;quot;)
setwd(&amp;quot;rnaseq&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;input-data-targets-file&#34;&gt;Input data: &lt;code&gt;targets&lt;/code&gt; file&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;targets&lt;/code&gt; file defines the input files (e.g. FASTQ or BAM) and sample
comparisons used in a data analysis workflow. It can also store any number of
additional descriptive information for each sample. The following shows the first
four lines of the &lt;code&gt;targets&lt;/code&gt; file used in this workflow template.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;targetspath &amp;lt;- system.file(&amp;quot;extdata&amp;quot;, &amp;quot;targetsPE.txt&amp;quot;, package = &amp;quot;systemPipeR&amp;quot;)
targets &amp;lt;- read.delim(targetspath, comment.char = &amp;quot;#&amp;quot;)
targets[1:4, -c(5, 6)]
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                     FileName1                   FileName2
## 1 ./data/SRR446027_1.fastq.gz ./data/SRR446027_2.fastq.gz
## 2 ./data/SRR446028_1.fastq.gz ./data/SRR446028_2.fastq.gz
## 3 ./data/SRR446029_1.fastq.gz ./data/SRR446029_2.fastq.gz
## 4 ./data/SRR446030_1.fastq.gz ./data/SRR446030_2.fastq.gz
##   SampleName Factor        Date
## 1        M1A     M1 23-Mar-2012
## 2        M1B     M1 23-Mar-2012
## 3        A1A     A1 23-Mar-2012
## 4        A1B     A1 23-Mar-2012
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To work with custom data, users need to generate a &lt;em&gt;&lt;code&gt;targets&lt;/code&gt;&lt;/em&gt; file containing
the paths to their own FASTQ files. &lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#4_The_targets_file&#34;&gt;Here&lt;/a&gt; is a detailed description of the structure and
utility of &lt;code&gt;targets&lt;/code&gt; files.&lt;/p&gt;
&lt;h1 id=&#34;quick-start&#34;&gt;Quick start&lt;/h1&gt;
&lt;p&gt;After a workflow environment has been created with the above &lt;code&gt;genWorkenvir&lt;/code&gt;
function call and the corresponding R session directed into the resulting directory (here &lt;code&gt;rnaseq&lt;/code&gt;),
the &lt;code&gt;SPRproject&lt;/code&gt; function is used to initialize a new workflow project instance. The latter
creates an empty &lt;code&gt;SAL&lt;/code&gt; workflow container (below &lt;code&gt;sal&lt;/code&gt;) and at the same time a
linked project log directory (default name &lt;code&gt;.SPRproject&lt;/code&gt;) that acts as a
flat-file database of a workflow. Additional details about this process and
the SAL workflow control class are provided in &lt;code&gt;systemPipeR&#39;s&lt;/code&gt; main vignette
&lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#11_Workflow_control_class&#34;&gt;here&lt;/a&gt;
and &lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Next, the &lt;code&gt;importWF&lt;/code&gt; function imports all the workflow steps outlined in the
source Rmd file of this vignette (here &lt;code&gt;systemPipeRNAseq.Rmd&lt;/code&gt;) into the &lt;code&gt;SAL&lt;/code&gt; workflow container.
An overview of the workflow steps and their status information can be returned
at any stage of the loading or run process by typing &lt;code&gt;sal&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(systemPipeR)
sal &amp;lt;- SPRproject()
sal &amp;lt;- importWF(sal, file_path = &amp;quot;systemPipeRNAseq.Rmd&amp;quot;, verbose = FALSE)
sal
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After loading the workflow into &lt;code&gt;sal&lt;/code&gt;, it can be executed from start to finish
(or partially) with the &lt;code&gt;runWF&lt;/code&gt; command. Running the workflow will only be
possible if all dependent CL software is installed on a user’s system. Their
names and availability on a system can be listed with &lt;code&gt;listCmdTools(sal, check_path=TRUE)&lt;/code&gt;. For more information about the &lt;code&gt;runWF&lt;/code&gt; command, refer to the
help file and the corresponding section in the main vignette
&lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#61_Overview&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Running workflows in parallel mode on computer clusters is a straightforward
process in &lt;code&gt;systemPipeR&lt;/code&gt;. Users can simply append the resource parameters (such
as the number of CPUs) for a cluster run to the &lt;code&gt;sal&lt;/code&gt; object after importing
the workflow steps with &lt;code&gt;importWF&lt;/code&gt; using the &lt;code&gt;addResources&lt;/code&gt; function. More
information about parallelization can be found in the corresponding section at
the end of this vignette &lt;a href=&#34;#paralellization&#34;&gt;here&lt;/a&gt; and in the main vignette
&lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#63_Parallel_evaluation&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;sal &amp;lt;- runWF(sal)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Workflows can be visualized as topology graphs using the &lt;code&gt;plotWF&lt;/code&gt; function.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;plotWF(sal)
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;
&lt;img src=&#34;results/plotwf_rnaseq.png&#34; alt=&#34;Toplogy graph of RNA-Seq workflow.&#34; width=&#34;100%&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
&lt;p&gt;&lt;span id=&#34;fig:rnaseq-toplogy&#34;&gt;&lt;/span&gt;Figure 1: Toplogy graph of RNA-Seq workflow.&lt;/p&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Scientific and technical reports can be generated with the &lt;code&gt;renderReport&lt;/code&gt; and
&lt;code&gt;renderLogs&lt;/code&gt; functions, respectively. Scientific reports can also be generated
with the &lt;code&gt;render&lt;/code&gt; function of the &lt;code&gt;rmarkdown&lt;/code&gt; package. The technical reports are
based on log information that &lt;code&gt;systemPipeR&lt;/code&gt; collects during workflow runs.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# Scientific report
sal &amp;lt;- renderReport(sal)
rmarkdown::render(&amp;quot;systemPipeRNAseq.Rmd&amp;quot;, clean = TRUE, output_format = &amp;quot;BiocStyle::html_document&amp;quot;)

# Technical (log) report
sal &amp;lt;- renderLogs(sal)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;statusWF&lt;/code&gt; function returns a status summary for each step in a &lt;code&gt;SAL&lt;/code&gt; workflow instance.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;statusWF(sal)
&lt;/code&gt;&lt;/pre&gt;
&lt;h1 id=&#34;workflow-steps-1&#34;&gt;Workflow steps&lt;/h1&gt;
&lt;p&gt;The data analysis steps of this workflow are defined by the following workflow code chunks.
They can be loaded into &lt;code&gt;SAL&lt;/code&gt; interactively, by executing the code of each step in the
R console, or all at once with the &lt;code&gt;importWF&lt;/code&gt; function used under the Quick start section.
R and CL workflow steps are declared in the code chunks of &lt;code&gt;Rmd&lt;/code&gt; files with the
&lt;code&gt;LineWise&lt;/code&gt; and &lt;code&gt;SYSargsList&lt;/code&gt; functions, respectively, and then added to the &lt;code&gt;SAL&lt;/code&gt; workflow
container with &lt;code&gt;appendStep&amp;lt;-&lt;/code&gt;. Their syntax and usage is described
&lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#52_Constructing_workflows&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;load-packages&#34;&gt;Load packages&lt;/h2&gt;
&lt;p&gt;The first step loads the &lt;code&gt;systemPipeR&lt;/code&gt; package.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;cat(crayon::blue$bold(&amp;quot;To use this workflow, the following R packages are required:\n&amp;quot;))
cat(c(&amp;quot;&#39;GenomicFeatures&amp;quot;, &amp;quot;BiocParallel&amp;quot;, &amp;quot;DESeq2&amp;quot;, &amp;quot;ape&amp;quot;, &amp;quot;edgeR&amp;quot;,
    &amp;quot;biomaRt&amp;quot;, &amp;quot;pheatmap&amp;quot;, &amp;quot;ggplot2&#39;\n&amp;quot;), sep = &amp;quot;&#39;, &#39;&amp;quot;)
### pre-end
appendStep(sal) &amp;lt;- LineWise(code = {
    library(systemPipeR)
}, step_name = &amp;quot;load_SPR&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;read-preprocessing&#34;&gt;Read preprocessing&lt;/h2&gt;
&lt;h3 id=&#34;with-preprocessreads&#34;&gt;With &lt;code&gt;preprocessReads&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;preprocessReads&lt;/code&gt; function allows applying predefined or custom read
preprocessing functions to all FASTQ files referenced in a SAL container, such
as quality filtering or adapter trimming routines. Internally, &lt;code&gt;preprocessReads&lt;/code&gt;
uses the &lt;code&gt;FastqStreamer&lt;/code&gt; function from the &lt;code&gt;ShortRead&lt;/code&gt; package to stream through
large FASTQ files in a memory-efficient manner. The following example uses
&lt;code&gt;preprocessReads&lt;/code&gt; to perform adapter trimming with the &lt;code&gt;trimLRPatterns&lt;/code&gt; function
from the &lt;code&gt;Biostrings&lt;/code&gt; package. In this instance, &lt;code&gt;preprocessReads&lt;/code&gt; is invoked
through a CL interface built on &lt;code&gt;docopt&lt;/code&gt;, that is executed from R with CWL. The
parameters for running &lt;code&gt;preprocessReads&lt;/code&gt; are specified in the corresponding
&lt;code&gt;cwl/yml&lt;/code&gt; files. It is important to point out that creating and using CL
interfaces for defining R-based workflow steps is not essential in &lt;code&gt;systemPipeR&lt;/code&gt;
since &lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5211_Step_1:_R_step&#34;&gt;&lt;code&gt;LineWise&lt;/code&gt;&lt;/a&gt;
offers similar capabilities while requiring less specialized
knowledge from users.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- SYSargsList(step_name = &amp;quot;preprocessing&amp;quot;, targets = &amp;quot;targetsPE.txt&amp;quot;,
    dir = TRUE, wf_file = &amp;quot;preprocessReads/preprocessReads-pe.cwl&amp;quot;,
    input_file = &amp;quot;preprocessReads/preprocessReads-pe.yml&amp;quot;, dir_path = system.file(&amp;quot;extdata/cwl&amp;quot;,
        package = &amp;quot;systemPipeR&amp;quot;), inputvars = c(FileName1 = &amp;quot;_FASTQ_PATH1_&amp;quot;,
        FileName2 = &amp;quot;_FASTQ_PATH2_&amp;quot;, SampleName = &amp;quot;_SampleName_&amp;quot;),
    dependency = c(&amp;quot;load_SPR&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The paths to the output files generated by the preprocessing step (here trimmed FASTQ files)
are recorded in a new &lt;code&gt;targets&lt;/code&gt; file that can be used for the next workflow step,
&lt;em&gt;e.g.&lt;/em&gt; running the NGS alignments with the trimmed FASTQ files.&lt;/p&gt;
&lt;p&gt;The following example demonstrates how to design a custom &lt;code&gt;preprocessReads&lt;/code&gt;
function, as well as how to replace parameters in the &lt;code&gt;sal&lt;/code&gt; object. To apply the
modifications to the workflow, it needs to be saved to a file, here &lt;code&gt;param/customFCT.RData&lt;/code&gt;
which will be loaded during the workflow run by the &lt;code&gt;preprocessReads.doc.R&lt;/code&gt; script.
Please note, this step is included here solely for demonstration purposes, and thus not
part of the workflow run. This is achieved by dropping &lt;code&gt;spr=TRUE&lt;/code&gt; in the header line of the
code chunk.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    filterFct &amp;lt;- function(fq, cutoff = 20, Nexceptions = 0) {
        qcount &amp;lt;- rowSums(as(quality(fq), &amp;quot;matrix&amp;quot;) &amp;lt;= cutoff,
            na.rm = TRUE)
        # Retains reads where Phred scores are &amp;gt;= cutoff
        # with N exceptions
        fq[qcount &amp;lt;= Nexceptions]
    }
    save(list = ls(), file = &amp;quot;param/customFCT.RData&amp;quot;)
}, step_name = &amp;quot;custom_preprocessing_function&amp;quot;, dependency = &amp;quot;preprocessing&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After defining this step, it can be inspected and modified as follows.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;yamlinput(sal, &amp;quot;preprocessing&amp;quot;)$Fct
yamlinput(sal, &amp;quot;preprocessing&amp;quot;, &amp;quot;Fct&amp;quot;) &amp;lt;- &amp;quot;&#39;filterFct(fq, cutoff=20, Nexceptions=0)&#39;&amp;quot;
yamlinput(sal, &amp;quot;preprocessing&amp;quot;)$Fct  ## check the new function
cmdlist(sal, &amp;quot;preprocessing&amp;quot;, targets = 1)  ## check if the command line was updated with success
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;with-trimmomatic&#34;&gt;With Trimmomatic&lt;/h3&gt;
&lt;p&gt;For demonstration purposes, this workflow uses the &lt;a href=&#34;http://www.usadellab.org/cms/?page=trimmomatic&#34;&gt;Trimmomatic&lt;/a&gt;
software as an example of an external CL read trimming tool (Bolger, Lohse, and Usadel 2014). Trimmomatic
offers a range of practical trimming utilities specifically designed for single- and paired-end Illumina reads.&lt;/p&gt;
&lt;p&gt;It is important to note that while the Trimmomatic trimming step is included in
this workflow, it’s not mandatory. Users can opt to use read trimming results
generated by the previous &lt;code&gt;preprocessReads&lt;/code&gt; step if preferred.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- SYSargsList(step_name = &amp;quot;trimming&amp;quot;, targets = &amp;quot;targetsPE.txt&amp;quot;,
    wf_file = &amp;quot;trimmomatic/trimmomatic-pe.cwl&amp;quot;, input_file = &amp;quot;trimmomatic/trimmomatic-pe.yml&amp;quot;,
    dir_path = system.file(&amp;quot;extdata/cwl&amp;quot;, package = &amp;quot;systemPipeR&amp;quot;),
    inputvars = c(FileName1 = &amp;quot;_FASTQ_PATH1_&amp;quot;, FileName2 = &amp;quot;_FASTQ_PATH2_&amp;quot;,
        SampleName = &amp;quot;_SampleName_&amp;quot;), dependency = &amp;quot;load_SPR&amp;quot;,
    run_step = &amp;quot;optional&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;fastq-quality-report&#34;&gt;FASTQ quality report&lt;/h3&gt;
&lt;p&gt;The following &lt;code&gt;seeFastq&lt;/code&gt; and &lt;code&gt;seeFastqPlot&lt;/code&gt; functions generate and plot a
series of useful quality statistics for a set of FASTQ files, including per
cycle quality box plots, base proportions, base-level quality trends, relative
k-mer diversity, length, and occurrence distribution of reads, number of reads
above quality cutoffs and mean quality distribution. The results can be
exported to different graphics formats, such as a PNG file, here named
&lt;code&gt;fastqReport.png&lt;/code&gt;. Detailed information about the usage and visual components
in the quality plots can be found in the corresponding help file (see
&lt;code&gt;?seeFastq&lt;/code&gt; or &lt;code&gt;?seeFastqPlot&lt;/code&gt;).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    fastq &amp;lt;- getColumn(sal, step = &amp;quot;preprocessing&amp;quot;, &amp;quot;targetsWF&amp;quot;,
        column = 1)
    fqlist &amp;lt;- seeFastq(fastq = fastq, batchsize = 10000, klength = 8)
    png(&amp;quot;./results/fastqReport.png&amp;quot;, height = 162, width = 288 *
        length(fqlist))
    seeFastqPlot(fqlist)
    dev.off()
}, step_name = &amp;quot;fastq_report&amp;quot;, dependency = &amp;quot;preprocessing&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;results/fastqReport.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;div data-align=&#34;center&#34;&gt;
&lt;p&gt;Figure 1: FASTQ quality report for 18 samples&lt;/p&gt;
&lt;/div&gt;
&lt;/br&gt;
&lt;h2 id=&#34;short-read-alignments&#34;&gt;Short read alignments&lt;/h2&gt;
&lt;h3 id=&#34;read-mapping-with-hisat2&#34;&gt;Read mapping with &lt;code&gt;HISAT2&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;To use the &lt;code&gt;HISAT2&lt;/code&gt; short read aligner developed by Kim, Langmead, and Salzberg
(2015), it is necessary to index the reference genome. &lt;code&gt;HISAT2&lt;/code&gt; relies on the
Burrows-Wheeler index for this process.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- SYSargsList(step_name = &amp;quot;hisat2_index&amp;quot;, dir = FALSE,
    targets = NULL, wf_file = &amp;quot;hisat2/hisat2-index.cwl&amp;quot;, input_file = &amp;quot;hisat2/hisat2-index.yml&amp;quot;,
    dir_path = &amp;quot;param/cwl&amp;quot;, dependency = &amp;quot;load_SPR&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;hisat2-mapping&#34;&gt;&lt;code&gt;HISAT2&lt;/code&gt; mapping&lt;/h3&gt;
&lt;p&gt;The parameter settings of the aligner are defined in the &lt;code&gt;cwl/yml&lt;/code&gt; files used
in the following code chunk. The following shows how to construct the alignment
step and append it to the &lt;code&gt;SAL&lt;/code&gt; workflow container. Please note that the input
(FASTQ) files used in this step are the output files generated by the
preprocessing step (see above: &lt;code&gt;step_name = &amp;quot;preprocessing&amp;quot;&lt;/code&gt;).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- SYSargsList(step_name = &amp;quot;hisat2_mapping&amp;quot;,
    dir = TRUE, targets = &amp;quot;preprocessing&amp;quot;, wf_file = &amp;quot;workflow-hisat2/workflow_hisat2-pe.cwl&amp;quot;,
    input_file = &amp;quot;workflow-hisat2/workflow_hisat2-pe.yml&amp;quot;, dir_path = &amp;quot;param/cwl&amp;quot;,
    inputvars = c(preprocessReads_1 = &amp;quot;_FASTQ_PATH1_&amp;quot;, preprocessReads_2 = &amp;quot;_FASTQ_PATH2_&amp;quot;,
        SampleName = &amp;quot;_SampleName_&amp;quot;), rm_targets_col = c(&amp;quot;FileName1&amp;quot;,
        &amp;quot;FileName2&amp;quot;), dependency = c(&amp;quot;preprocessing&amp;quot;, &amp;quot;hisat2_index&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;cmdlist&lt;/code&gt; functions allows to inspect the exact CL call used for each input file (sample), here
for &lt;code&gt;HISAT2&lt;/code&gt; alignments. Note, this step also includes the conversion of the alignment files to sorted
and indexed bam files using functionalities of the &lt;code&gt;SAMtools&lt;/code&gt; CL suite.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;cmdlist(sal, step = &amp;quot;hisat2_mapping&amp;quot;, targets = 1)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;$hisat2_mapping
$hisat2_mapping$M1A
$hisat2_mapping$M1A$hisat2
[1] &amp;quot;hisat2 -S ./results/M1A.sam  -x ./data/tair10.fasta  -k 1  --min-intronlen
30  --max-intronlen 3000  -1 ./results/M1A_1.fastq_trim.gz -2 ./results/M1A_2.fa
stq_trim.gz --threads 4&amp;quot;

$hisat2_mapping$M1A$`samtools-view`
[1] &amp;quot;samtools view -bS -o ./results/M1A.bam  ./results/M1A.sam &amp;quot;

$hisat2_mapping$M1A$`samtools-sort`
[1] &amp;quot;samtools sort -o ./results/M1A.sorted.bam  ./results/M1A.bam  -@ 4&amp;quot;

$hisat2_mapping$M1A$`samtools-index`
[1] &amp;quot;samtools index -b results/M1A.sorted.bam  results/M1A.sorted.bam.bai  ./res
ults/M1A.sorted.bam &amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;alignment-stats&#34;&gt;Alignment stats&lt;/h3&gt;
&lt;p&gt;The following computes an alignment summary file (here &lt;code&gt;alignStats.xls&lt;/code&gt;), which
comprises the count of reads in each FASTQ file and the number of reads that
align with the reference, presented in both total and percentage values.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    fqpaths &amp;lt;- getColumn(sal, step = &amp;quot;preprocessing&amp;quot;, &amp;quot;targetsWF&amp;quot;,
        column = &amp;quot;FileName1&amp;quot;)
    bampaths &amp;lt;- getColumn(sal, step = &amp;quot;hisat2_mapping&amp;quot;, &amp;quot;outfiles&amp;quot;,
        column = &amp;quot;samtools_sort_bam&amp;quot;)
    read_statsDF &amp;lt;- alignStats(args = bampaths, fqpaths = fqpaths,
        pairEnd = TRUE)
    write.table(read_statsDF, &amp;quot;results/alignStats.xls&amp;quot;, row.names = FALSE,
        quote = FALSE, sep = &amp;quot;\t&amp;quot;)
}, step_name = &amp;quot;align_stats&amp;quot;, dependency = &amp;quot;hisat2_mapping&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The resulting &lt;code&gt;alignStats.xls&lt;/code&gt; file can be included in the report as shown below (here restricted to the
first four rows).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;read.table(&amp;quot;results/alignStats.xls&amp;quot;, header = TRUE)[1:4, ]
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   FileName Nreads2x Nalign Perc_Aligned Nalign_Primary
## 1      M1A   115994 109977     94.81266         109977
## 2      M1B   134480 112464     83.62879         112464
## 3      A1A   127976 122427     95.66403         122427
## 4      A1B   122486 101369     82.75966         101369
##   Perc_Aligned_Primary
## 1             94.81266
## 2             83.62879
## 3             95.66403
## 4             82.75966
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;viewing-bam-files-in-igv&#34;&gt;Viewing BAM files in IGV&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;symLink2bam&lt;/code&gt; function creates symbolic links to view the BAM alignment files in a
genome browser such as IGV without moving these large files to a local
system. The corresponding URLs are written to a file with a path
specified under &lt;code&gt;urlfile&lt;/code&gt;, here &lt;code&gt;IGVurl.txt&lt;/code&gt;.
Please replace the directory and the user name.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;symLink2bam&lt;/code&gt; function creates symbolic links to view the BAM alignment files
in a genome browser such as IGV without moving these large files to a local
system. The corresponding URLs are written to a file with a path specified
under &lt;code&gt;urlfile&lt;/code&gt;, here &lt;code&gt;IGVurl.txt&lt;/code&gt;. To make the following code work, users need to
change the directory name (here &lt;code&gt;&amp;lt;somedir&amp;gt;&lt;/code&gt;), and the url base and user names (here
&lt;code&gt;&amp;lt;base_url&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;username&amp;gt;&lt;/code&gt;) to the corresponding names on their system.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    bampaths &amp;lt;- getColumn(sal, step = &amp;quot;hisat2_mapping&amp;quot;, &amp;quot;outfiles&amp;quot;,
        column = &amp;quot;samtools_sort_bam&amp;quot;)
    symLink2bam(sysargs = bampaths, htmldir = c(&amp;quot;~/.html/&amp;quot;, &amp;quot;&amp;lt;somedir&amp;gt;/&amp;quot;),
        urlbase = &amp;quot;&amp;lt;base_url&amp;gt;/~&amp;lt;username&amp;gt;/&amp;quot;, urlfile = &amp;quot;./results/IGVurl.txt&amp;quot;)
}, step_name = &amp;quot;bam_IGV&amp;quot;, dependency = &amp;quot;hisat2_mapping&amp;quot;, run_step = &amp;quot;optional&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;read-quantification&#34;&gt;Read quantification&lt;/h2&gt;
&lt;p&gt;Reads overlapping with annotation ranges of interest are counted for
each sample using the &lt;code&gt;summarizeOverlaps&lt;/code&gt; function (Lawrence et al. 2013).
Most often the read counting is preformed for exonic gene regions. This can be
done in a strand-specific or non-strand-specific manner, while accounting for overlaps
among adjacent genes or ignoring them. Subsequently, the expression
count values can be normalized with different methods.&lt;/p&gt;
&lt;h3 id=&#34;gene-annotation-database&#34;&gt;Gene annotation database&lt;/h3&gt;
&lt;p&gt;For efficient handling of annotation ranges obtained from GFF or GTF files,
they are organized within a &lt;code&gt;TxDb&lt;/code&gt; object. Subsequently, the object is written
to a SQLite database file. It is important to note that this process only needs to
be performed once for a specific version of an annotation file.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    library(GenomicFeatures)
    txdb &amp;lt;- suppressWarnings(makeTxDbFromGFF(file = &amp;quot;data/tair10.gff&amp;quot;,
        format = &amp;quot;gff&amp;quot;, dataSource = &amp;quot;TAIR&amp;quot;, organism = &amp;quot;Arabidopsis thaliana&amp;quot;))
    saveDb(txdb, file = &amp;quot;./data/tair10.sqlite&amp;quot;)
}, step_name = &amp;quot;create_db&amp;quot;, dependency = &amp;quot;hisat2_mapping&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;read-counting-with-summarizeoverlaps&#34;&gt;Read counting with &lt;code&gt;summarizeOverlaps&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The provided example employs non-strand-specific read counting while
disregarding overlaps between different genes. As normalization the example uses
&lt;em&gt;reads per kilobase per million mapped reads&lt;/em&gt; (RPKM). The raw read count table
(&lt;code&gt;countDFeByg.xls&lt;/code&gt;) and the corresponding RPKM table (&lt;code&gt;rpkmDFeByg.xls&lt;/code&gt;) are written
to distinct files in the project’s results directory. Parallelization across
multiple CPU cores is achieved with the &lt;code&gt;BiocParallel&lt;/code&gt; package. When supplying a
&lt;code&gt;BamFileList&lt;/code&gt; as illustrated below, &lt;code&gt;the summarizeOverlaps&lt;/code&gt; method defaults to
employing &lt;code&gt;bplapply&lt;/code&gt; and the register interface from &lt;code&gt;BiocParallel&lt;/code&gt;. The
&lt;code&gt;MulticoreParam&lt;/code&gt; will utilize the number of cores returned by
&lt;code&gt;parallel::detectCores&lt;/code&gt; if the number of workers is left unspecified. For
further information, refer to the help documentation by typing
&lt;code&gt;help(&amp;quot;summarizeOverlaps&amp;quot;)&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    library(GenomicFeatures)
    library(BiocParallel)
    txdb &amp;lt;- loadDb(&amp;quot;./data/tair10.sqlite&amp;quot;)
    outpaths &amp;lt;- getColumn(sal, step = &amp;quot;hisat2_mapping&amp;quot;, &amp;quot;outfiles&amp;quot;,
        column = &amp;quot;samtools_sort_bam&amp;quot;)
    eByg &amp;lt;- exonsBy(txdb, by = c(&amp;quot;gene&amp;quot;))
    bfl &amp;lt;- BamFileList(outpaths, yieldSize = 50000, index = character())
    multicoreParam &amp;lt;- MulticoreParam(workers = 4)
    register(multicoreParam)
    registered()
    counteByg &amp;lt;- bplapply(bfl, function(x) summarizeOverlaps(eByg,
        x, mode = &amp;quot;Union&amp;quot;, ignore.strand = TRUE, inter.feature = FALSE,
        singleEnd = FALSE, BPPARAM = multicoreParam))
    countDFeByg &amp;lt;- sapply(seq(along = counteByg), function(x) assays(counteByg[[x]])$counts)
    rownames(countDFeByg) &amp;lt;- names(rowRanges(counteByg[[1]]))
    colnames(countDFeByg) &amp;lt;- names(bfl)
    rpkmDFeByg &amp;lt;- apply(countDFeByg, 2, function(x) returnRPKM(counts = x,
        ranges = eByg))
    write.table(countDFeByg, &amp;quot;results/countDFeByg.xls&amp;quot;, col.names = NA,
        quote = FALSE, sep = &amp;quot;\t&amp;quot;)
    write.table(rpkmDFeByg, &amp;quot;results/rpkmDFeByg.xls&amp;quot;, col.names = NA,
        quote = FALSE, sep = &amp;quot;\t&amp;quot;)
    ## Creating a SummarizedExperiment object
    colData &amp;lt;- data.frame(row.names = SampleName(sal, &amp;quot;hisat2_mapping&amp;quot;),
        condition = getColumn(sal, &amp;quot;hisat2_mapping&amp;quot;, position = &amp;quot;targetsWF&amp;quot;,
            column = &amp;quot;Factor&amp;quot;))
    colData$condition &amp;lt;- factor(colData$condition)
    countDF_se &amp;lt;- SummarizedExperiment::SummarizedExperiment(assays = countDFeByg,
        colData = colData)
    ## Add results as SummarizedExperiment to the workflow
    ## object
    SE(sal, &amp;quot;read_counting&amp;quot;) &amp;lt;- countDF_se
}, step_name = &amp;quot;read_counting&amp;quot;, dependency = &amp;quot;create_db&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Importantly, when conducting statistical differential expression or abundance analysis using
methods like &lt;code&gt;edgeR&lt;/code&gt; or &lt;code&gt;DESeq2&lt;/code&gt;, the raw count values are the expected
input. RPKM values should be reserved for specialized applications, such as
manually inspecting expression levels across different genes or features.&lt;/p&gt;
&lt;p&gt;Shows first 10 rows of &lt;code&gt;countDFeByg.xls&lt;/code&gt; table.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;read.delim(&amp;quot;results/countDFeByg.xls&amp;quot;, row.names = 1, check.names = FALSE)[1:10,
    ]
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##           M1A M1B  A1A A1B V1A V1B M6A M6B A6A A6B  V6A
## AT1G01010 286 260  364 181 568 300 255 135 514 318  757
## AT1G01020 104 136  139 131 174 156 148 131 114 104  206
## AT1G01030 120 109  167  59 136 192  74  26  23  73  118
## AT1G01040 911 727 1030 627 962 918 862 618 880 639 1632
## AT1G01046  23  12   17  13  16  26  19  14  23  21   24
## AT1G01050 189 178  247 184 226 380 524 619 382 414  622
## AT1G01060  98 262   86  88  33  32   8   4   6   3    2
## AT1G01070   0   1    5   0  10   5  11  13  29   8   28
## AT1G01073   0   0    0   0   0   0   0   0   0   0    0
## AT1G01080 377 390  363 454 476 630 437 747 266 350  352
##            V6B M12A M12B A12A A12B V12A V12B
## AT1G01010  551  198  248  527  417  650  671
## AT1G01020  212   67  156  130  120   80  158
## AT1G01030  214   45   51   31   48  177  442
## AT1G01040 1552  651 1095 1324  702  671  995
## AT1G01046   36   28   23   33   13   23   20
## AT1G01050  962  666 1355  737  532  635 1004
## AT1G01060   10  220  317  501  198  164  159
## AT1G01070   14   11   65   64   39   23   24
## AT1G01073    0    0    0    0    0    0    0
## AT1G01080  765  384 1037  343  299  267  373
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;sample-wise-clustering&#34;&gt;Sample-wise clustering&lt;/h3&gt;
&lt;p&gt;The sample-wise Spearman correlation coefficients are calculated from the &lt;code&gt;rlog&lt;/code&gt;
transformed expression values (&lt;code&gt;countDF_se&lt;/code&gt;) generated using the &lt;code&gt;DESeq2&lt;/code&gt; package.
These values are then converted into a distance matrix, which is subsequently
used for hierarchical clustering with the &lt;code&gt;hclust&lt;/code&gt; function. The resulting
dendrogram is then saved as a PNG file named &lt;code&gt;sample_tree.png&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    library(DESeq2, quietly = TRUE)
    library(ape, warn.conflicts = FALSE)
    ## Extracting SummarizedExperiment object
    se &amp;lt;- SE(sal, &amp;quot;read_counting&amp;quot;)
    dds &amp;lt;- DESeqDataSet(se, design = ~condition)
    d &amp;lt;- cor(assay(rlog(dds)), method = &amp;quot;spearman&amp;quot;)
    hc &amp;lt;- hclust(dist(1 - d))
    png(&amp;quot;results/sample_tree.png&amp;quot;)
    plot.phylo(as.phylo(hc), type = &amp;quot;p&amp;quot;, edge.col = &amp;quot;blue&amp;quot;, edge.width = 2,
        show.node.label = TRUE, no.margin = TRUE)
    dev.off()
}, step_name = &amp;quot;sample_tree&amp;quot;, dependency = &amp;quot;read_counting&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;results/sample_tree.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;div data-align=&#34;center&#34;&gt;
&lt;p&gt;Figure 2: Correlation dendrogram of samples&lt;/p&gt;
&lt;/div&gt;
&lt;/br&gt;
&lt;h2 id=&#34;analysis-of-degs&#34;&gt;Analysis of DEGs&lt;/h2&gt;
&lt;p&gt;The analysis of differentially expressed genes (DEGs) is performed with
the &lt;code&gt;glm&lt;/code&gt; method of the &lt;code&gt;edgeR&lt;/code&gt; package (Robinson, McCarthy, and Smyth 2010). The sample
comparisons used by this analysis are defined in the header lines of the
&lt;code&gt;targets.txt&lt;/code&gt; file starting with &lt;code&gt;&amp;lt;CMP&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;run-edger&#34;&gt;Run &lt;code&gt;edgeR&lt;/code&gt;&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    library(edgeR)
    countDF &amp;lt;- read.delim(&amp;quot;results/countDFeByg.xls&amp;quot;, row.names = 1,
        check.names = FALSE)
    cmp &amp;lt;- readComp(stepsWF(sal)[[&amp;quot;hisat2_mapping&amp;quot;]], format = &amp;quot;matrix&amp;quot;,
        delim = &amp;quot;-&amp;quot;)
    edgeDF &amp;lt;- run_edgeR(countDF = countDF, targets = targetsWF(sal)[[&amp;quot;hisat2_mapping&amp;quot;]],
        cmp = cmp[[1]], independent = FALSE, mdsplot = &amp;quot;&amp;quot;)
    write.table(edgeDF, &amp;quot;./results/edgeRglm_allcomp.xls&amp;quot;, quote = FALSE,
        sep = &amp;quot;\t&amp;quot;, col.names = NA)
}, step_name = &amp;quot;run_edger&amp;quot;, dependency = &amp;quot;read_counting&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note, to call DEGs with &lt;code&gt;DESeq2&lt;/code&gt; instead of &lt;code&gt;edgeR&lt;/code&gt;, users can simply replace in the above code
‘&lt;code&gt;run_edgeR&lt;/code&gt;’ with ‘&lt;code&gt;run_DESeq2&lt;/code&gt;’.&lt;/p&gt;
&lt;h3 id=&#34;add-gene-descriptions&#34;&gt;Add gene descriptions&lt;/h3&gt;
&lt;p&gt;This step is optional. It appends functional descriptions obtained from BioMart to the DEG table.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    library(&amp;quot;biomaRt&amp;quot;)
    m &amp;lt;- useMart(&amp;quot;plants_mart&amp;quot;, dataset = &amp;quot;athaliana_eg_gene&amp;quot;,
        host = &amp;quot;https://plants.ensembl.org&amp;quot;)
    desc &amp;lt;- getBM(attributes = c(&amp;quot;tair_locus&amp;quot;, &amp;quot;description&amp;quot;),
        mart = m)
    desc &amp;lt;- desc[!duplicated(desc[, 1]), ]
    descv &amp;lt;- as.character(desc[, 2])
    names(descv) &amp;lt;- as.character(desc[, 1])
    edgeDF &amp;lt;- data.frame(edgeDF, Desc = descv[rownames(edgeDF)],
        check.names = FALSE)
    write.table(edgeDF, &amp;quot;./results/edgeRglm_allcomp.xls&amp;quot;, quote = FALSE,
        sep = &amp;quot;\t&amp;quot;, col.names = NA)
}, step_name = &amp;quot;custom_annot&amp;quot;, dependency = &amp;quot;run_edger&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;plot-deg-results&#34;&gt;Plot DEG results&lt;/h3&gt;
&lt;p&gt;Filter and plot DEG results for up and down regulated genes. The
definition of &lt;em&gt;up&lt;/em&gt; and &lt;em&gt;down&lt;/em&gt; is given in the corresponding help
file. To open it, type &lt;code&gt;?filterDEGs&lt;/code&gt; in the R console.&lt;/p&gt;
&lt;p&gt;Note, due to the small number of genes in the toy dataset, the FDR cutoff in
this example is set to an unreasonably large value. With real data sets this cutoff
should be set to a much smaller value (often 1%, 5% or 10%).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    edgeDF &amp;lt;- read.delim(&amp;quot;results/edgeRglm_allcomp.xls&amp;quot;, row.names = 1,
        check.names = FALSE)
    png(&amp;quot;results/DEGcounts.png&amp;quot;)
    DEG_list &amp;lt;- filterDEGs(degDF = edgeDF, filter = c(Fold = 2,
        FDR = 20))
    dev.off()
    write.table(DEG_list$Summary, &amp;quot;./results/DEGcounts.xls&amp;quot;,
        quote = FALSE, sep = &amp;quot;\t&amp;quot;, row.names = FALSE)
}, step_name = &amp;quot;filter_degs&amp;quot;, dependency = &amp;quot;custom_annot&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;./results/DEGcounts.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;div data-align=&#34;center&#34;&gt;
&lt;p&gt;Figure 3: Up and down regulated DEGs.&lt;/p&gt;
&lt;/div&gt;
&lt;/br&gt;
&lt;h3 id=&#34;venn-diagrams-of-deg-sets&#34;&gt;Venn diagrams of DEG sets&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;overLapper&lt;/code&gt; function can compute Venn intersects for large numbers of sample
sets (up to 20 or more) and plots 2-5 way Venn diagrams. A useful
feature is the possibility to combine the counts from several Venn
comparisons with the same number of sample sets in a single Venn diagram
(here for 4 up and down DEG sets).&lt;/p&gt;
&lt;p&gt;The overLapper function can compute Venn intersects for large numbers of sample
sets (up to 20 or more) and plots 2-5 way Venn diagrams. A useful feature is
the possibility to combine the counts from several Venn comparisons with the
same number of sample sets in a single Venn diagram (here for 4 up and down DEG
sets).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    vennsetup &amp;lt;- overLapper(DEG_list$Up[6:9], type = &amp;quot;vennsets&amp;quot;)
    vennsetdown &amp;lt;- overLapper(DEG_list$Down[6:9], type = &amp;quot;vennsets&amp;quot;)
    png(&amp;quot;results/vennplot.png&amp;quot;)
    vennPlot(list(vennsetup, vennsetdown), mymain = &amp;quot;&amp;quot;, mysub = &amp;quot;&amp;quot;,
        colmode = 2, ccol = c(&amp;quot;blue&amp;quot;, &amp;quot;red&amp;quot;))
    dev.off()
}, step_name = &amp;quot;venn_diagram&amp;quot;, dependency = &amp;quot;filter_degs&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;./results/vennplot.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;div data-align=&#34;center&#34;&gt;
&lt;p&gt;Figure 4: Venn Diagram for 4 Up and Down DEG Sets&lt;/p&gt;
&lt;/div&gt;
&lt;/br&gt;
&lt;h2 id=&#34;go-term-enrichment-analysis&#34;&gt;GO term enrichment analysis&lt;/h2&gt;
&lt;h3 id=&#34;obtain-gene-to-go-mappings&#34;&gt;Obtain gene-to-GO mappings&lt;/h3&gt;
&lt;p&gt;The following shows how to obtain gene-to-GO mappings from &lt;code&gt;biomaRt&lt;/code&gt; (here for &lt;em&gt;A.
thaliana&lt;/em&gt;) and how to organize them for the downstream GO term
enrichment analysis. Alternatively, the gene-to-GO mappings can be
obtained for many organisms from Bioconductor’s &lt;code&gt;*.db&lt;/code&gt; genome annotation
packages or GO annotation files provided by various genome databases.
For each annotation this relatively slow preprocessing step needs to be
performed only once. Subsequently, the preprocessed data can be loaded
with the &lt;code&gt;load&lt;/code&gt; function as shown in the next subsection.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    library(&amp;quot;biomaRt&amp;quot;)
    # listMarts() # To choose BioMart database
    # listMarts(host=&#39;plants.ensembl.org&#39;)
    m &amp;lt;- useMart(&amp;quot;plants_mart&amp;quot;, host = &amp;quot;https://plants.ensembl.org&amp;quot;)
    # listDatasets(m)
    m &amp;lt;- useMart(&amp;quot;plants_mart&amp;quot;, dataset = &amp;quot;athaliana_eg_gene&amp;quot;,
        host = &amp;quot;https://plants.ensembl.org&amp;quot;)
    # listAttributes(m) # Choose data types you want to
    # download
    go &amp;lt;- getBM(attributes = c(&amp;quot;go_id&amp;quot;, &amp;quot;tair_locus&amp;quot;, &amp;quot;namespace_1003&amp;quot;),
        mart = m)
    go &amp;lt;- go[go[, 3] != &amp;quot;&amp;quot;, ]
    go[, 3] &amp;lt;- as.character(go[, 3])
    go[go[, 3] == &amp;quot;molecular_function&amp;quot;, 3] &amp;lt;- &amp;quot;F&amp;quot;
    go[go[, 3] == &amp;quot;biological_process&amp;quot;, 3] &amp;lt;- &amp;quot;P&amp;quot;
    go[go[, 3] == &amp;quot;cellular_component&amp;quot;, 3] &amp;lt;- &amp;quot;C&amp;quot;
    go[1:4, ]
    if (!dir.exists(&amp;quot;./data/GO&amp;quot;))
        dir.create(&amp;quot;./data/GO&amp;quot;)
    write.table(go, &amp;quot;data/GO/GOannotationsBiomart_mod.txt&amp;quot;, quote = FALSE,
        row.names = FALSE, col.names = FALSE, sep = &amp;quot;\t&amp;quot;)
    catdb &amp;lt;- makeCATdb(myfile = &amp;quot;data/GO/GOannotationsBiomart_mod.txt&amp;quot;,
        lib = NULL, org = &amp;quot;&amp;quot;, colno = c(1, 2, 3), idconv = NULL)
    save(catdb, file = &amp;quot;data/GO/catdb.RData&amp;quot;)
}, step_name = &amp;quot;get_go_annot&amp;quot;, dependency = &amp;quot;filter_degs&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;batch-go-term-enrichment-analysis&#34;&gt;Batch GO term enrichment analysis&lt;/h3&gt;
&lt;p&gt;Apply the enrichment analysis to the DEG sets obtained the above differential
expression analysis. Note, in the following example the &lt;code&gt;FDR&lt;/code&gt; filter is set
here to an unreasonably high value, simply because of the small size of the toy
data set used in this vignette. Batch enrichment analysis of many gene sets is
performed with the &lt;code&gt;GOCluster_Report&lt;/code&gt; function. When &lt;code&gt;method=all&lt;/code&gt;, it returns all GO terms passing
the p-value cutoff specified under the &lt;code&gt;cutoff&lt;/code&gt; arguments. When &lt;code&gt;method=slim&lt;/code&gt;,
it returns only the GO terms specified under the &lt;code&gt;myslimv&lt;/code&gt; argument. The given
example shows how a GO slim vector for a specific organism can be obtained from
&lt;code&gt;BioMart&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    library(&amp;quot;biomaRt&amp;quot;)
    load(&amp;quot;data/GO/catdb.RData&amp;quot;)
    DEG_list &amp;lt;- filterDEGs(degDF = edgeDF, filter = c(Fold = 2,
        FDR = 50), plot = FALSE)
    up_down &amp;lt;- DEG_list$UporDown
    names(up_down) &amp;lt;- paste(names(up_down), &amp;quot;_up_down&amp;quot;, sep = &amp;quot;&amp;quot;)
    up &amp;lt;- DEG_list$Up
    names(up) &amp;lt;- paste(names(up), &amp;quot;_up&amp;quot;, sep = &amp;quot;&amp;quot;)
    down &amp;lt;- DEG_list$Down
    names(down) &amp;lt;- paste(names(down), &amp;quot;_down&amp;quot;, sep = &amp;quot;&amp;quot;)
    DEGlist &amp;lt;- c(up_down, up, down)
    DEGlist &amp;lt;- DEGlist[sapply(DEGlist, length) &amp;gt; 0]
    BatchResult &amp;lt;- GOCluster_Report(catdb = catdb, setlist = DEGlist,
        method = &amp;quot;all&amp;quot;, id_type = &amp;quot;gene&amp;quot;, CLSZ = 2, cutoff = 0.9,
        gocats = c(&amp;quot;MF&amp;quot;, &amp;quot;BP&amp;quot;, &amp;quot;CC&amp;quot;), recordSpecGO = NULL)
    m &amp;lt;- useMart(&amp;quot;plants_mart&amp;quot;, dataset = &amp;quot;athaliana_eg_gene&amp;quot;,
        host = &amp;quot;https://plants.ensembl.org&amp;quot;)
    goslimvec &amp;lt;- as.character(getBM(attributes = c(&amp;quot;goslim_goa_accession&amp;quot;),
        mart = m)[, 1])
    BatchResultslim &amp;lt;- GOCluster_Report(catdb = catdb, setlist = DEGlist,
        method = &amp;quot;slim&amp;quot;, id_type = &amp;quot;gene&amp;quot;, myslimv = goslimvec,
        CLSZ = 10, cutoff = 0.01, gocats = c(&amp;quot;MF&amp;quot;, &amp;quot;BP&amp;quot;, &amp;quot;CC&amp;quot;),
        recordSpecGO = NULL)
    write.table(BatchResultslim, &amp;quot;results/GOBatchSlim.xls&amp;quot;, row.names = FALSE,
        quote = FALSE, sep = &amp;quot;\t&amp;quot;)
}, step_name = &amp;quot;go_enrich&amp;quot;, dependency = &amp;quot;get_go_annot&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;plot-batch-go-term-results&#34;&gt;Plot batch GO term results&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;data.frame&lt;/code&gt; generated by &lt;code&gt;GOCluster&lt;/code&gt; can be plotted with the &lt;code&gt;goBarplot&lt;/code&gt; function. Because of the
variable size of the sample sets, it may not always be desirable to show
the results from different DEG sets in the same bar plot. Plotting
single sample sets is achieved by subsetting the input data frame as
shown in the first line of the following example.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    gos &amp;lt;- BatchResultslim[grep(&amp;quot;M6-V6_up_down&amp;quot;, BatchResultslim$CLID),
        ]
    gos &amp;lt;- BatchResultslim
    png(&amp;quot;results/GOslimbarplotMF.png&amp;quot;, height = 8, width = 10)
    goBarplot(gos, gocat = &amp;quot;MF&amp;quot;)
    goBarplot(gos, gocat = &amp;quot;BP&amp;quot;)
    goBarplot(gos, gocat = &amp;quot;CC&amp;quot;)
    dev.off()
}, step_name = &amp;quot;go_plot&amp;quot;, dependency = &amp;quot;go_enrich&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;results/GOslimbarplotMF.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;div data-align=&#34;center&#34;&gt;
&lt;p&gt;Figure 5: GO Slim Barplot for MF Ontology&lt;/p&gt;
&lt;/div&gt;
&lt;/br&gt;
&lt;h2 id=&#34;clustering-and-heat-maps&#34;&gt;Clustering and heat maps&lt;/h2&gt;
&lt;p&gt;The following example performs hierarchical clustering on the &lt;code&gt;rlog&lt;/code&gt;
transformed expression matrix subsetted by the DEGs identified in the above
differential expression analysis. It uses a Pearson correlation-based distance
measure and complete linkage for cluster joining.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    library(pheatmap)
    geneids &amp;lt;- unique(as.character(unlist(DEG_list[[1]])))
    y &amp;lt;- assay(rlog(dds))[geneids, ]
    png(&amp;quot;results/heatmap1.png&amp;quot;)
    pheatmap(y, scale = &amp;quot;row&amp;quot;, clustering_distance_rows = &amp;quot;correlation&amp;quot;,
        clustering_distance_cols = &amp;quot;correlation&amp;quot;)
    dev.off()
}, step_name = &amp;quot;heatmap&amp;quot;, dependency = &amp;quot;go_enrich&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;results/heatmap1.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;div data-align=&#34;center&#34;&gt;
&lt;p&gt;Figure 6: Heat Map with Hierarchical Clustering Dendrograms of DEGs&lt;/p&gt;
&lt;/div&gt;
&lt;/br&gt;
&lt;h2 id=&#34;workflow-session-information&#34;&gt;Workflow session information&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;appendStep(sal) &amp;lt;- LineWise(code = {
    sessionInfo()
}, step_name = &amp;quot;sessionInfo&amp;quot;, dependency = &amp;quot;heatmap&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h1 id=&#34;additional-details&#34;&gt;Additional details&lt;/h1&gt;
&lt;h2 id=&#34;running-workflows&#34;&gt;Running workflows&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;runWF&lt;/code&gt; function is the primary tool for executing workflows. It runs the
code of the workflow steps after loading them into a &lt;code&gt;SAL&lt;/code&gt; workflow container.
The workflow steps can be loaded interactively one by one or in batch mode with
the &lt;code&gt;importWF&lt;/code&gt; function. The batch mode is more convenient and is the intended
method for loading workflows. It is part of the standard routine for running
workflows introduced in the &lt;a href=&#34;#quick-start&#34;&gt;Quick start&lt;/a&gt; section.&lt;/p&gt;
&lt;h3 id=&#34;parallelization-on-clusters&#34;&gt;Parallelization on clusters&lt;/h3&gt;
&lt;p&gt;The processing time of computationally expensive steps can be greatly accelerated by
processing many input files in parallel using several CPUs and/or computer nodes
of an HPC or cloud system, where a scheduling system is used for load balancing.
To simplify for users the configuration and execution of workflow steps in serial or parallel mode,
&lt;code&gt;systemPipeR&lt;/code&gt; uses for both the same &lt;code&gt;runWF&lt;/code&gt; function. Parallelization simply
requires appending of the parallelization parameters to the settings of the corresponding workflow
steps each requesting the computing resources specified by the user, such as
the number of CPU cores, RAM and run time. These resource settings are
stored in the corresponding workflow step of the &lt;code&gt;SAL&lt;/code&gt; workflow container.
After adding the parallelization parameters, &lt;code&gt;runWF&lt;/code&gt; will execute the chosen steps
in parallel mode as instructed.&lt;/p&gt;
&lt;p&gt;The following example applies to an alignment step of an RNA-Seq workflow.
In the chosen alignment example, the parallelization
parameters are added to the alignment step (here &lt;code&gt;hisat2_mapping&lt;/code&gt;) of &lt;code&gt;SAL&lt;/code&gt; via
a &lt;code&gt;resources&lt;/code&gt; list. The given parameter settings will run 18 processes (&lt;code&gt;Njobs&lt;/code&gt;) in
parallel using for each 4 CPU cores (&lt;code&gt;ncpus&lt;/code&gt;), thus utilizing a total of 72 CPU
cores. The &lt;code&gt;runWF&lt;/code&gt; function can be used with most queueing systems as it is based on
utilities defined by the &lt;code&gt;batchtools&lt;/code&gt; package, which supports the use of
template files (&lt;em&gt;&lt;code&gt;*.tmpl&lt;/code&gt;&lt;/em&gt;) for defining the run parameters of different
schedulers. In the given example below, a &lt;code&gt;conffile&lt;/code&gt; (see
&lt;em&gt;&lt;code&gt;.batchtools.conf.R&lt;/code&gt;&lt;/em&gt; samples &lt;a href=&#34;https://mllg.github.io/batchtools/&#34;&gt;here&lt;/a&gt;) and
a &lt;code&gt;template&lt;/code&gt; file (see &lt;em&gt;&lt;code&gt;*.tmpl&lt;/code&gt;&lt;/em&gt; samples
&lt;a href=&#34;https://github.com/mllg/batchtools/tree/master/inst/templates&#34;&gt;here&lt;/a&gt;) need to be present
on the highest level of a user’s workflow project. The following example uses the sample
&lt;code&gt;conffile&lt;/code&gt; and &lt;code&gt;template&lt;/code&gt; files for the Slurm scheduler that are both provided by this
package.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;resources&lt;/code&gt; list can be added to analysis steps when a workflow is loaded into &lt;code&gt;SAL&lt;/code&gt;.
Alternatively, one can add the resource settings with the &lt;code&gt;addResources&lt;/code&gt; function
to any step of a pre-populated &lt;code&gt;SAL&lt;/code&gt; container afterwards. For workflow steps with the same resource
requirements, one can add them to several steps at once with a single call to &lt;code&gt;addResources&lt;/code&gt; by
specifying multiple step names under the &lt;code&gt;step&lt;/code&gt; argument.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;resources &amp;lt;- list(conffile=&amp;quot;.batchtools.conf.R&amp;quot;,
                  template=&amp;quot;batchtools.slurm.tmpl&amp;quot;, 
                  Njobs=18, 
                  walltime=120, ## in minutes
                  ntasks=1,
                  ncpus=4, 
                  memory=1024, ## in Mb
                  partition = &amp;quot;short&amp;quot;  
                  )
sal &amp;lt;- addResources(sal, step=c(&amp;quot;hisat2_mapping&amp;quot;), resources = resources)
sal &amp;lt;- runWF(sal)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The above example will submit via &lt;code&gt;runWF(sal)&lt;/code&gt; the &lt;em&gt;hisat2_mapping&lt;/em&gt; step
to a partition (queue) called &lt;code&gt;short&lt;/code&gt; on an HPC cluster. Users need to adjust this and
other parameters, that are defined in the &lt;code&gt;resources&lt;/code&gt; list, to their cluster environment .&lt;/p&gt;
&lt;h2 id=&#34;cl-tools-used&#34;&gt;CL tools used&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;listCmdTools&lt;/code&gt; (and &lt;code&gt;listCmdModules&lt;/code&gt;) return the CL tools that
are used by a workflow. To include a CL tool list in a workflow report,
one can use the following code. Additional details on this topic
can be found in the main vignette &lt;a href=&#34;https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#111_Accessor_methods&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;if (file.exists(file.path(&amp;quot;.SPRproject&amp;quot;, &amp;quot;SYSargsList.yml&amp;quot;))) {
    local({
        sal &amp;lt;- systemPipeR::SPRproject(resume = TRUE)
        systemPipeR::listCmdTools(sal)
        systemPipeR::listCmdModules(sal)
    })
} else {
    cat(crayon::blue$bold(&amp;quot;Tools and modules required by this workflow are:\n&amp;quot;))
    cat(c(&amp;quot;gzip&amp;quot;, &amp;quot;gunzip&amp;quot;), sep = &amp;quot;\n&amp;quot;)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Tools and modules required by this workflow are:
## gzip
## gunzip
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;session-info&#34;&gt;Session info&lt;/h2&gt;
&lt;p&gt;This is the session information that will be included when rendering this report.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;sessionInfo()
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 11 (bullseye)
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/Los_Angeles
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils    
## [6] datasets  methods   base     
## 
## other attached packages:
##  [1] systemPipeR_2.6.0           ShortRead_1.58.0           
##  [3] GenomicAlignments_1.36.0    SummarizedExperiment_1.30.0
##  [5] Biobase_2.60.0              MatrixGenerics_1.12.0      
##  [7] matrixStats_0.63.0          BiocParallel_1.34.0        
##  [9] Rsamtools_2.16.0            Biostrings_2.68.0          
## [11] XVector_0.40.0              GenomicRanges_1.52.0       
## [13] GenomeInfoDb_1.36.0         IRanges_2.34.0             
## [15] S4Vectors_0.38.0            BiocGenerics_0.46.0        
## [17] BiocStyle_2.28.0           
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.3            xfun_0.39              
##  [3] bslib_0.4.2             hwriter_1.3.2.1        
##  [5] ggplot2_3.4.2           htmlwidgets_1.6.2      
##  [7] latticeExtra_0.6-30     lattice_0.21-8         
##  [9] generics_0.1.3          vctrs_0.6.2            
## [11] tools_4.3.0             bitops_1.0-7           
## [13] parallel_4.3.0          tibble_3.2.1           
## [15] fansi_1.0.4             pkgconfig_2.0.3        
## [17] Matrix_1.5-4            RColorBrewer_1.1-3     
## [19] lifecycle_1.0.3         GenomeInfoDbData_1.2.10
## [21] stringr_1.5.0           compiler_4.3.0         
## [23] deldir_1.0-6            munsell_0.5.0          
## [25] codetools_0.2-19        htmltools_0.5.5        
## [27] sass_0.4.5              RCurl_1.98-1.12        
## [29] yaml_2.3.7              pillar_1.9.0           
## [31] crayon_1.5.2            jquerylib_0.1.4        
## [33] DelayedArray_0.25.0     cachem_1.0.7           
## [35] tidyselect_1.2.0        digest_0.6.31          
## [37] stringi_1.7.12          dplyr_1.1.2            
## [39] bookdown_0.33           fastmap_1.1.1          
## [41] grid_4.3.0              colorspace_2.1-0       
## [43] cli_3.6.1               magrittr_2.0.3         
## [45] utf8_1.2.3              scales_1.2.1           
## [47] rmarkdown_2.21          jpeg_0.1-10            
## [49] interp_1.1-4            blogdown_1.16          
## [51] png_0.1-8               evaluate_0.20          
## [53] knitr_1.42              rlang_1.1.1            
## [55] Rcpp_1.0.10             glue_1.6.2             
## [57] formatR_1.14            BiocManager_1.30.20    
## [59] jsonlite_1.8.4          R6_2.5.1               
## [61] zlibbioc_1.46.0
&lt;/code&gt;&lt;/pre&gt;
&lt;h1 id=&#34;funding&#34;&gt;Funding&lt;/h1&gt;
&lt;p&gt;This project is funded by awards from the National Science Foundation (&lt;a href=&#34;https://www.nsf.gov/awardsearch/showAward?AWD_ID=1661152&#34;&gt;ABI-1661152&lt;/a&gt;],
and the National Institute on Aging of the National Institutes of Health (&lt;a href=&#34;https://reporter.nih.gov/project-details/9632486&#34;&gt;U19AG023122&lt;/a&gt;).&lt;/p&gt;
&lt;h1 id=&#34;references&#34;&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references hanging-indent&#34;&gt;
&lt;div id=&#34;ref-Bolger2014-yr&#34;&gt;
&lt;p&gt;Bolger, Anthony M, Marc Lohse, and Bjoern Usadel. 2014. “Trimmomatic: A Flexible Trimmer for Illumina Sequence Data.” &lt;em&gt;Bioinformatics&lt;/em&gt; 30 (15): 2114–20.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-H_Backman2016-bt&#34;&gt;
&lt;p&gt;H Backman, Tyler W, and Thomas Girke. 2016. “systemPipeR: NGS workflow and report generation environment.” &lt;em&gt;BMC Bioinformatics&lt;/em&gt; 17 (1): 388. &lt;a href=&#34;https://doi.org/10.1186/s12859-016-1241-0&#34;&gt;https://doi.org/10.1186/s12859-016-1241-0&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Howard2013-fq&#34;&gt;
&lt;p&gt;Howard, Brian E, Qiwen Hu, Ahmet Can Babaoglu, Manan Chandra, Monica Borghi, Xiaoping Tan, Luyan He, et al. 2013. “High-Throughput RNA Sequencing of Pseudomonas-Infected Arabidopsis Reveals Hidden Transcriptome Complexity and Novel Splice Variants.” &lt;em&gt;PLoS One&lt;/em&gt; 8 (10): e74183. &lt;a href=&#34;https://doi.org/10.1371/journal.pone.0074183&#34;&gt;https://doi.org/10.1371/journal.pone.0074183&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Kim2015-ve&#34;&gt;
&lt;p&gt;Kim, Daehwan, Ben Langmead, and Steven L Salzberg. 2015. “HISAT: A Fast Spliced Aligner with Low Memory Requirements.” &lt;em&gt;Nat. Methods&lt;/em&gt; 12 (4): 357–60.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Lawrence2013-kt&#34;&gt;
&lt;p&gt;Lawrence, Michael, Wolfgang Huber, Hervé Pagès, Patrick Aboyoun, Marc Carlson, Robert Gentleman, Martin T Morgan, and Vincent J Carey. 2013. “Software for Computing and Annotating Genomic Ranges.” &lt;em&gt;PLoS Comput. Biol.&lt;/em&gt; 9 (8): e1003118. &lt;a href=&#34;https://doi.org/10.1371/journal.pcbi.1003118&#34;&gt;https://doi.org/10.1371/journal.pcbi.1003118&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Robinson2010-uk&#34;&gt;
&lt;p&gt;Robinson, M D, D J McCarthy, and G K Smyth. 2010. “EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” &lt;em&gt;Bioinformatics&lt;/em&gt; 26 (1): 139–40. &lt;a href=&#34;https://doi.org/10.1093/bioinformatics/btp616&#34;&gt;https://doi.org/10.1093/bioinformatics/btp616&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

      </description>
    </item>
    
  </channel>
</rss>
