Single-Cell Multi-omics

openpipeline comprehensive collection of extensible, best-practice analysis pipelines for processing single-cell multi-omics data. These workflows handle various omics modalities, including gene expression, antibody capture, VDJ, and ATAC data, from diverse experimental set-ups such as CITE-seq, ATAC-seq, and Drop-seq. The platform provides reproducible and scalable solutions across the entire analysis journey, from raw sequencing data processing to functional interpretation.

Overview of Functionality

The workflows provide end-to-end support for:

Demultiplexing: Conversion of raw sequencing data (BCL files) to FASTQ format by separating multiplexed reads based on cell barcodes and UMIs
Data Ingestion: Read mapping to reference genomes and generation of count matrices for different experimental platforms
Quality Control: Calculation of comprehensive QC metrics with optional generation of QC reports
Data Pre-processing: Count-based filtering, doublet removal, count transformation and preparation of expression matrices for downstream analysis
Integration: Removal of unwanted technical variation across batches using state-of-the-art computational methods
Cell Type Annotation: Cell type label projection using reference datasets and machine learning approaches

Demultiplexing

The demultiplexing components convert raw BCL files from Illumina sequencers into FASTQ format by separating multiplexed reads based on their cell barcodes and UMIs. Both Illumina's demultiplexing solution as well as 10X Genomics wrappers are available:

Data Ingestion

The ingestion workflows alligns FASTQ files to reference genomes to generate count matrices in H5MU format. Both BD Genomics and 10X Genomics protocols are supported:

Note that a workflow for generating the transcriptome reference is also available.

Data pre-processing

After having generated count matrices in H5MU format, single-cell data require comprehensive pre-processing, which is supported by the pre-processing workflow. This workflow enables processing of multiple modalities, including Gene Expression, Antibody Capture, VDJ and ATAC. The following pre-processing steps are executed:

Count-based filtering: Removal of low-quality cells and genes
Doublet detection: Flagging of doublets using Scrublet
Normalization: Adjustment for technical differences in sequencing depth
Log transformation: Transformation of counts to log scale
Highly Variable Gene (HVG) detection: Identification of highly variable genes

Batch integration

Multiple workflows with various computational approaches are available to remove unwanted technical variation across batches while preserving biological differences:

All of the above mentioned integration workflows also include steps to perform neighbor detection, Leiden clustering, and UMAP dimensionality reduction to facilitate exploration of the integrated data.

Downstream Analysis

The platform provides multiple approaches for cell type annotation, assigning biological identities to cells based on their expression profiles:

scANVI
CellTypist
scVI with KNN label transfer
Harmony with KNN label transfer
PopV
scGPT
OnClass

Workflows == modules

Viash workflows are built on a modular architecture where components and workflows are fully equivalent. Each component can be executed as a stand-alone workflow, while any workflow can be seamlessly integrated as a dependency of another workflow. This design enables flexible customization and recombination to address diverse analytical needs.