openpipeline: Single-Cell Multi-omics Pipeline

Single-cell technologies such as scRNA-seq, scATAC-seq and CITE-seq allow researchers to profile gene expression, chromatin accessibility and surface proteins at single-cell resolution.
However, analysis workflows are complex, require multiple tools and are often locked into proprietary vendor platforms.
This page presents an open, reproducible alternative built with Viash components.

Who is it for?

Computational biologists and platform leads working on single-cell or multi-modal projects.
Whether you are analyzing 10x Genomics data, Smart-seq libraries or custom protocols, the pipeline can be adapted to your needs.

Overview of Functionality

The workflows provide end-to-end support for:

  • Demultiplexing: Conversion of raw sequencing data (BCL files) to FASTQ format by separating multiplexed reads based on cell barcodes and UMIs
  • Data Ingestion: Read mapping to reference genomes and generation of count matrices for different experimental platforms
  • Quality Control: Calculation of comprehensive QC metrics with optional generation of QC reports
  • Data Pre-processing: Count-based filtering, doublet removal, count transformation and preparation of expression matrices for downstream analysis
  • Integration: Removal of unwanted technical variation across batches using state-of-the-art computational methods
  • Cell Type Annotation: Cell type label projection using reference datasets and machine learning approaches

Demultiplexing

The demultiplexing components convert raw BCL files from Illumina sequencers into FASTQ format by separating multiplexed reads based on their cell barcodes and UMIs. Both Illumina's demultiplexing solution as well as 10X Genomics wrappers are available:

Data Ingestion

The ingestion workflows aligns FASTQ files to reference genomes to generate count matrices in H5MU format. Both BD Genomics and 10X Genomics protocols are supported:

Note that a workflow for generating the transcriptome reference is also available.

Data pre-processing

After having generated count matrices in H5MU format, single-cell data require comprehensive pre-processing, which is supported by the pre-processing workflow. This workflow enables processing of multiple modalities, including Gene Expression, Antibody Capture, VDJ and ATAC. The following pre-processing steps are executed:

Batch integration

Multiple workflows with various computational approaches are available to remove unwanted technical variation across batches while preserving biological differences:

All of the above mentioned integration workflows also include steps to perform neighbor detection, Leiden clustering, and UMAP dimensionality reduction to facilitate exploration of the integrated data.

Downstream Analysis

The platform provides multiple approaches for cell type annotation, assigning biological identities to cells based on their expression profiles: