Single-Cell Multi-omics

openpipeline comprehensive collection of extensible, best-practice analysis pipelines for processing single-cell multi-omics data. These workflows handle various omics modalities, including gene expression, antibody capture, VDJ, and ATAC data, from diverse experimental set-ups such as CITE-seq, ATAC-seq, and Drop-seq. The platform provides reproducible and scalable solutions across the entire analysis journey, from raw sequencing data processing to functional interpretation.

Overview of Functionality

The workflows provide end-to-end support for:

  • Demultiplexing: Conversion of raw sequencing data (BCL files) to FASTQ format by separating multiplexed reads based on cell barcodes and UMIs
  • Data Ingestion: Read mapping to reference genomes and generation of count matrices for different experimental platforms
  • Quality Control: Calculation of comprehensive QC metrics with optional generation of QC reports
  • Data Pre-processing: Count-based filtering, doublet removal, count transformation and preparation of expression matrices for downstream analysis
  • Integration: Removal of unwanted technical variation across batches using state-of-the-art computational methods
  • Cell Type Annotation: Cell type label projection using reference datasets and machine learning approaches

Demultiplexing

The demultiplexing components convert raw BCL files from Illumina sequencers into FASTQ format by separating multiplexed reads based on their cell barcodes and UMIs. Both Illumina's demultiplexing solution as well as 10X Genomics wrappers are available:

Data Ingestion

The ingestion workflows alligns FASTQ files to reference genomes to generate count matrices in H5MU format. Both BD Genomics and 10X Genomics protocols are supported:

Note that a workflow for generating the transcriptome reference is also available.

Data pre-processing

After having generated count matrices in H5MU format, single-cell data require comprehensive pre-processing, which is supported by the pre-processing workflow. This workflow enables processing of multiple modalities, including Gene Expression, Antibody Capture, VDJ and ATAC. The following pre-processing steps are executed:

Batch integration

Multiple workflows with various computational approaches are available to remove unwanted technical variation across batches while preserving biological differences:

All of the above mentioned integration workflows also include steps to perform neighbor detection, Leiden clustering, and UMAP dimensionality reduction to facilitate exploration of the integrated data.

Downstream Analysis

The platform provides multiple approaches for cell type annotation, assigning biological identities to cells based on their expression profiles:

Workflows == modules

Viash workflows are built on a modular architecture where components and workflows are fully equivalent. Each component can be executed as a stand-alone workflow, while any workflow can be seamlessly integrated as a dependency of another workflow. This design enables flexible customization and recombination to address diverse analytical needs.