workflows/annotation/scgpt_annotation

Description

Cell type annotation workflow using scGPT. The workflow takes a pre-processed h5mu file as query input, and performs

  • subsetting for HVG

  • cross-checking of genes with the model vocabulary

  • binning of gene counts

  • padding and tokenizing of genes

  • transformer-based cell type prediction Note that cell-type prediction using scGPT is only possible using a fine-tuned scGPT model.

Type

nextflow_script

License

MIT

Query input

Name
Type & Properties
--id
string
required
--input
file
required
--modality
string
--input_layer
string
--input_var_gene_names
string
--input_obs_batch_label
string
required

Model input

Name
Type & Properties
--model
file
required
--model_config
file
required
--model_vocab
file
required
--finetuned_checkpoints_key
string
--label_mapper_key
string

Outputs

Name
Type & Properties
--output
file
required
output
--output_compression
string
--output_obs_predictions
string
--output_obs_probability
string

Padding arguments

Name
Type & Properties
--pad_token
string
--pad_value
integer

HVG subset arguments

Name
Type & Properties
--n_hvg
integer
--hvg_flavor
string

Tokenization arguments

Name
Type & Properties
--max_seq_len
integer

Embedding arguments

Name
Type & Properties
--dsbn
boolean
--batch_size
integer

Binning arguments

Name
Type & Properties
--n_input_bins
integer
--seed
integer