mag: Parameters

Define where the pipeline should find input data and save output data.

CSV samplesheet file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

Specifies that the input is single-end reads.

type: boolean

Additional input CSV samplesheet containing information about pre-computed assemblies. When set, both read pre-processing and assembly are skipped and the pipeline begins at the binning stage.

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Reference genome related files and options required for the workflow.

Do not load the iGenomes reference config.

hidden

type: boolean

The base path to the igenomes reference files

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Use monochrome_logs

hidden

type: boolean

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Use these parameters to also enable reproducible results from the individual assembly and binning tools .

Fix number of CPUs for MEGAHIT to 1. Not increased with retries.

type: boolean

Fix number of CPUs used by SPAdes. Not increased with retries.

type: integer

default: -1

Fix number of CPUs used by SPAdes hybrid. Not increased with retries.

type: integer

default: -1

RNG seed for MetaBAT2.

type: integer

default: 1

Specify which adapter clipping tool to use.

type: string

Specify to save the resulting clipped FASTQ files to —outdir.

type: boolean

The minimum length of reads must have to be retained for downstream analysis.

type: integer

default: 15

Minimum phred quality value of a base to be qualified in fastp.

type: integer

default: 15

The mean quality requirement used for per read sliding window cutting by fastp.

type: integer

default: 15

Save reads that fail fastp filtering in a separate file. Not used downstream.

type: boolean

Turn on detecting and trimming of poly-G tails

type: boolean

The minimum base quality for low-quality base trimming by AdapterRemoval.

type: integer

default: 2

Turn on quality trimming by consecutive stretch of low quality bases, rather than by window.

type: boolean

Forward read adapter to be trimmed by AdapterRemoval.

type: string

default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

pattern: ^[ATGCRYKMSWBDHVN]*$

Reverse read adapter to be trimmed by AdapterRemoval for paired end data.

type: string

default: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

pattern: ^[ATGCRYKMSWBDHVN]*$

Name of iGenomes reference for host contamination removal.

type: string

Fasta reference file for host contamination removal.

type: string

Bowtie2 index directory corresponding to --host_fasta reference file for host contamination removal.

type: string

Use the --very-sensitive instead of the--sensitivesetting for Bowtie 2 to map reads against the host genome.

type: boolean

Save the read IDs of removed host reads.

type: boolean

Specify to save input FASTQ files with host reads removed to —outdir.

type: boolean

Keep reads similar to the Illumina internal standard PhiX genome.

type: boolean

Genome reference used to remove Illumina PhiX contaminant reads.

hidden

type: string

default: ${baseDir}/assets/data/GCA_002596845.1_ASM259684v1_genomic.fna.gz

Skip read preprocessing using fastp or adapterremoval.

type: boolean

Specify to save input FASTQ files with phiX reads removed to —outdir.

type: boolean

Run BBnorm to normalize sequence depth.

type: boolean

Set BBnorm target maximum depth to this number.

type: integer

default: 100

Set BBnorm minimum depth to this number.

type: integer

default: 5

Save normalized read files to output directory.

type: boolean

Skip removing adapter sequences from long reads.

type: boolean

Discard any read which is shorter than this value.

type: integer

default: 1000

Discard any read which has a mean quality score lower than this value.

type: integer

Keep this percent of bases.

type: integer

default: 90

The higher the more important is read length when choosing the best reads.

type: integer

default: 10

Keep reads similar to the ONT internal standard Escherichia virus Lambda genome.

type: boolean

Genome reference used to remove ONT Lambda contaminant reads.

hidden

type: string

default: ${baseDir}/assets/data/GCA_000840245.1_ViralProj14204_genomic.fna.gz

Specify to save input FASTQ files with lamba reads removed to —outdir.

type: boolean

Specify to save the resulting clipped FASTQ files to —outdir.

type: boolean

Specify to save the resulting length filtered long read FASTQ files to —outdir.

type: boolean

Specify which long read adapter trimming tool to use.

type: string

Specify which long read filtering tool to use.

type: string

Taxonomic classification is disabled by default. You have to specify one of the options below to activate it.

Database for taxonomic binning with centrifuge.

type: string

Database for taxonomic binning with kraken2.

type: string

Database for taxonomic binning with krona

type: string

Skip creating a krona plot for taxonomic binning.

type: boolean

Database for taxonomic classification of metagenome assembled genomes. Can be either a zipped file or a directory containing the extracted output of such.

type: string

Generate CAT database.

type: boolean

Save the CAT database generated when specified by --cat_db_generate.

type: boolean

Only return official taxonomic ranks (Kingdom, Phylum, etc.) when running CAT.

type: boolean

Skip the running of GTDB, as well as the automatic download of the database

type: boolean

Specify the location of a GTDBTK database. Can be either an uncompressed directory or a .tar.gz archive. If not specified will be downloaded for you when GTDBTK or binning QC is not skipped.

type: string

default:

https://data.gtdb.ecogenomic.org/releases/release220/220.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r220_data.tar.gz

Specify the location of a GTDBTK mash database. If missing, GTDB-Tk will skip the ani_screening step

type: string

Min. bin completeness (in %) required to apply GTDB-tk classification.

type: number

default: 50

Max. bin contamination (in %) allowed to apply GTDB-tk classification.

type: number

default: 10

Min. fraction of AA (in %) in the MSA for bins to be kept.

type: number

default: 10

Min. alignment fraction to consider closest genome.

type: number

default: 0.65

Number of CPUs used for the by GTDB-Tk run tool pplacer.

type: integer

default: 1

Speed up pplacer step of GTDB-Tk by loading to memory.

type: boolean

Co-assemble samples within one group, instead of assembling each sample separately.

type: boolean

Additional custom options for SPAdes and SPAdesHybrid. Do not specify --meta as this will be added for you!

type: string

Specify whether to use contigs or scaffolds assembled by SPAdes

type: string

Additional custom options for MEGAHIT.

type: string

Skip Illumina-only SPAdes assembly.

type: boolean

Skip SPAdes hybrid assembly.

type: boolean

Skip MEGAHIT assembly.

type: boolean

Skip metaQUAST.

type: boolean

Skip Prodigal gene prediction

type: boolean

Turn on Prokka complicance mode for truncating contig names for NCBI/ENA compatibility.

type: boolean

Specify sequencing centre name required for Prokka’s compliance mode.

type: string

Skip Prokka genome annotation.

type: boolean

Skip MetaEuk gene prediction and annotation

type: boolean

A string containing the name of one of the databases listed in the mmseqs2 documentation. This database will be downloaded and formatted for eukaryotic genome annotation. Incompatible with —metaeuk_db.

type: string

Path to either a local fasta file of protein sequences, or to a directory containing an MMseqs2-formatted database, for annotation of eukaryotic genomes.

type: string

Save the downloaded mmseqs2 database specified in --metaeuk_mmseqs_db.

type: boolean

Run virus identification.

type: boolean

Database for virus classification with geNomad

type: string

Minimum geNomad score for a sequence to be considered viral

type: number

default: 0.7

Number of groups that geNomad’s MMSeqs2 databse should be split into (reduced memory requirements)

type: integer

default: 1

Defines mapping strategy to compute co-abundances for binning, i.e. which samples will be mapped against the assembly.

type: string

Skip metagenome binning entirely

type: boolean

Skip MetaBAT2 Binning

type: boolean

Skip MaxBin2 Binning

type: boolean

Skip CONCOCT Binning

type: boolean

Minimum contig size to be considered for binning and for bin quality check.

type: integer

default: 1500

Minimal length of contigs that are not part of any bin but treated as individual genome.

type: integer

default: 1000000

Maximal number of contigs that are not part of any bin but treated as individual genome.

type: integer

default: 100

Specify the shortest length a bin should be to retain for downstream processing (in base pairs)

type: integer

Specify the longest length a bin should be to retain for downstream processing (in base pairs). By default no limit.

type: integer

Specify length of sub-contigs cut up prior CONCOCT binning

type: integer

default: 10000

Specify the overlap between each sub-contig prior CONCOCT binning

type: integer

Specify to not append the last contig less than sub-contig length to the last correct length contig

type: boolean

Specify alternative Bowtie2 settings for aligning reads back against the assembly.

type: string

pattern: ^[-\w]*$

Save the output of mapping raw reads back to assembled contigs

type: boolean

Enable domain-level (prokaryote or eukaryote) classification of bins using Tiara. Processes which are domain-specific will then only receive bins matching the domain requirement.

type: boolean

Specify which tool to use for domain classification of bins. Currently only ‘tiara’ is implemented.

hidden

type: string

default: tiara

Minimum contig length for Tiara to use for domain classification. For accurate classification, should be longer than 3000 bp.

type: integer

default: 3000

Exclude unbinned contigs in the post-binning steps (bin QC, taxonomic classification, and annotation steps).

type: boolean

Disable bin QC with BUSCO, CheckM or CheckM2.

type: boolean

Specify which tool for bin quality-control validation to use.

type: string

Download URL, local tar.gz archive, or local uncompressed directory for an *_odb10 or *_odb12 BUSCO lineage dataset.

type: string

Name of the BUSCO *_odb10 or *_odb12 lineage to check against. Additionally supports ‘auto’, ‘auto_prok’ and ‘auto_euk’ for automatic lineage selection mode.

type: string

default: auto

pattern: (.*_odb(10|12))|auto(_prok|_euk)?$

Save the used BUSCO lineage datasets provided via --busco_db.

type: boolean

Enable clean-up of temporary files created during BUSCO runs.

type: boolean

URL pointing to checkM database for auto download, if local path not supplied.

hidden

type: string

default: https://zenodo.org/records/7401545/files/checkm_data_2015_01_16.tar.gz

Path to local folder containing already downloaded and uncompressed CheckM database.

type: string

Save the used CheckM reference files downloaded when not using —checkm_db parameter.

type: boolean

Path to local file of an already downloaded and uncompressed CheckM2 database (.dmnd file).

type: string

CheckM2 database version number to download (Zenodo record ID, for reference check the canonical reference https://zenodo.org/records/5571251, and pick the Zenodo ID of the database version of your choice).

type: integer

default: 14897628

Save the used CheckM2 reference files downloaded when not using —checkm2_db parameter.

type: boolean

Turn on bin refinement using DAS Tool.

type: boolean

Specify single-copy gene score threshold for bin refinement.

type: number

default: 0.5

Specify which binning output is sent for downstream annotation, taxonomic classification, bin quality control etc.

type: string

Turn on GUNC genome chimerism checks

type: boolean

Specify a path to a pre-downloaded GUNC dmnd database file

type: string

Specify which database to auto-download if not supplying own

type: string

Save the used GUNC reference files downloaded when not using —gunc_db parameter.

type: boolean

Performs ancient DNA assembly validation and contig consensus sequence recalling.

Turn on/off the ancient DNA subworfklow

type: boolean

PyDamage accuracy threshold

type: number

default: 0.5

deactivate damage correction of ancient contigs using variant and consensus calling

type: boolean

Ploidy for variant calling

type: integer

default: 1

minimum base quality required for variant calling

type: integer

default: 20

minimum minor allele frequency for considering variants

type: number

default: 0.33

minimum genotype quality for considering a variant high quality

type: integer

default: 30

minimum genotype quality for considering a variant medium quality

type: integer

default: 20

minimum number of bases supporting the alternative allele

type: integer

default: 3

nf-core/mag