htrnaseq: When Scale Matters

Modern sequencers can produce billions of reads in a single run, often across multiple lanes and barcodes.
For core facilities and pipeline developers, manually splitting and processing this data becomes a bottleneck.
The High-Throughput RNA-seq workflow extends the bulk RNA-seq pipeline to support demultiplexing and parallel processing of large sequencing runs.

Key Features

  • Demultiplexing -- Convert BCL files into FASTQ using Illumina's bcl-convert and split reads by sample or index.
  • UMI and barcode handling -- Extract Unique Molecular Identifiers and handle dual-index designs typical of NovaSeq or NextSeq instruments.
  • Parallel execution -- Launch per-sample workflows in parallel across lanes or compute nodes, with built-in scheduling support.
  • Lane-specific metadata -- Track sample metadata and lane origin throughout processing to assist with downstream QC.

After demultiplexing, the resulting FASTQ files can be passed through the same sub-workflows used in the Bulk RNA-seq pipeline (pre-processing, alignment, quantification, etc.).
Shared components ensure consistency between low-throughput and high-throughput processing.

Compliance & Scalability

By leveraging Viash components, large-scale RNA-seq processing remains reproducible and auditable.
Each demultiplexing and processing step runs in a versioned container with a recorded SBOM.
You can run the pipeline on HPC clusters or cloud platforms without modifying the workflow, ensuring that scaling up does not compromise governance.

Overview of Functionality

This workflow is designed to process high-throughput RNA-seq data, where every well of a microarray plate is a sample. A fasta file provided as input defines the mapping between sample barcodes and wells.

The full workflow is split in two major subworkflows that can be run independently:

Input for the workflow has to be fastq files. For bcl or other formats, the demultiplex workflow needs to be run first.