Bulk RNA-seq

rnaseq is a collection of workflows for the end-to-end processinging of bulk transcriptomics data. It includes workflows for genome alignment, quantification, and quality control.

Overview of Functionality

The end-to-end rnaseq workflow has 6 sub-workflows that can also be run independently.

  • Prepare genome: Preparation of all the reference data required for downstream analysis, i.e., uncompress provided reference data or generate the required index files (for STAR, Salmon, Kallisto, RSEM, BBSplit).
  • Pre-processing: Quality control on the input reads, performing FastQC, extracts UMIs, trims adapters, and removal of ribosomal RNA reads. Adapters can be trimmed using either Trim galore! or fastp (work in progress).
  • Genome alignment and quantification: Genome alignment using STAR and transcript quantification using Salmon or RSEM (using RSEM’s built-in support for STAR) (work in progress). Alignment sorting and indexing, as well as computation of statistics from the BAM files is performed using Samtools. UMI-based deduplication is also performed.
  • Post-processing: Marking of duplicate reads (picard MarkDuplicates), transcript assembly and quantification (StringTie), and creation of bigWig coverage files.
  • Pseudo alignment and quantification: Pseudo alignment and transcript quantification using Salmon or Kallisto.
  • Final QC: A quality control workflow performing RSeQC, dupRadar, Qualimap, Preseq, DESeq2 and featureCounts. It presents QC for raw reads, alignments, gene biotype, sample similarity, and strand specificity (MultiQC).
Workflows == modules

Viash workflows are built on a modular architecture where components and workflows are fully equivalent. Each component can be executed as a stand-alone workflow, while any workflow can be seamlessly integrated as a dependency of another workflow. This design enables flexible customization and recombination to address diverse analytical needs.