nf-core/funcscan
(Meta-)genome screening for functional and natural product gene sequences
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing sample names and paths to corresponding FASTA files, and optional annotation files.
string^\S+\.csv$The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$MultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringThese parameters influence which workflow (ARG, AMP and/or BGC) to activate.
Activate antimicrobial peptide genes screening tools.
booleanActivate antimicrobial resistance gene screening tools.
booleanActivate biosynthetic gene cluster screening tools.
booleanThese options influence whether to activate the taxonomic classification of the input nucleotide sequences.
Activates the taxonomic classification of input nucleotide sequences.
booleanSpecifies the tool used for taxonomic classification.
stringIf MMseqs2 is chosen as taxonomic classification tool: Specifies if the output of all MMseqs2 subcommands shall be compressed.
booleanThese parameters influence the database to be used in classifying the taxonomy.
Specify a path to MMseqs2-formatted database.
stringSpecify the label of the database to be used.
stringKalamariSpecify whether the temporary files should be saved.
booleanThese parameters influence the taxonomic classification step.
Specify whether to save the temporary files.
booleanSpecify the alignment type between database and query.
integer2Specify the taxonomic levels to display in the result table.
stringkingdom,phylum,class,order,family,genus,speciesSpecify whether to include or remove the taxonomic lineage.
integer1Specify the speed and sensitivity for taxonomy assignment.
number5Specify the ORF search sensitivity in the prefilter step.
number2Specify the mode to assign the taxonomy.
integer3Specify the weights of the taxonomic assignment.
integer1These options influence the generation of annotation files required for downstream steps in ARG, AMP, and BGC workflows.
Specify which annotation tool to use for some downstream tools.
stringSpecify whether to save gene annotations in the results directory.
booleanBAKTA is a tool developed to annotate bacterial genomes and plasmids from both isolates and MAGs. More info: https://github.com/oschwengers/bakta
Specify a path to a local copy of a BAKTA database.
stringDownload full or light version of the Bakta database if not supplying own database.
stringUse the default genome-length optimised mode (rather than the metagenome mode).
booleanSpecify the minimum contig size.
integer1Specify the genetic code translation table.
integer11Specify the type of bacteria to be annotated to detect signaling peptides.
stringSpecify that all contigs are complete replicons.
booleanChanges the original contig headers.
booleanClean the result annotations to standardise them to Genbank/ENA conventions.
booleanActivate tRNA detection & annotation.
booleanActivate tmRNA detection & annotation.
booleanActivate rRNA detection & annotation.
booleanActivate ncRNA detection & annotation.
booleanActivate ncRNA region detection & annotation.
booleanActivate CRISPR array detection & annotation.
booleanSkip CDS detection & annotation.
booleanActivate pseudogene detection & annotation.
booleanSkip sORF detection & annotation.
booleanActivate gap detection & annotation.
booleanActivate oriC/oriT detection & annotation.
booleanActivate generation of circular genome plots.
booleanSupply a path of an HMM file of trusted hidden markov models in HMMER format for CDS annotation
stringProkka annotates genomic sequences belonging to bacterial, archaeal and viral genomes. More info: https://github.com/tseemann/prokka
Use the default genome-length optimised mode (rather than the metagenome mode).
booleanSuppress the default clean-up of the gene annotations.
booleanSpecify the kingdom that the input represents.
stringSpecify the translation table used to annotate the sequences.
integer11Minimum contig size required for annotation (bp).
integer1E-value cut-off.
number0.000001Set the assigned minimum coverage.
integer80Allow transfer RNA (trRNA) to overlap coding sequences (CDS).
booleanUse RNAmmer for rRNA prediction.
booleanForce contig name to Genbank/ENA/DDJB naming rules.
booleantrueAdd the gene features for each CDS hit.
booleanRetains contig names.
booleanProdigal is a protein-coding gene prediction tool developed to run on bacterial and archaeal genomes. More info: https://github.com/hyattpd/prodigal/wiki
Specify whether to use Prodigal’s single-genome mode for long sequences.
booleanDoes not allow partial genes on contig edges.
booleanSpecifies the translation table used for gene annotation.
integer11Forces Prodigal to scan for motifs.
booleanPyrodigal is a resource-optimized wrapper around Prodigal, producing protein-coding gene predictions of bacterial and archaeal genomes. Read more at the Pyrodigal GitHub repository (https://github.com/althonos/pyrodigal) or its documentation (https://pyrodigal.readthedocs.io).
Specify whether to use Pyrodigal’s single-genome mode for long sequences.
booleanDoes not allow partial genes on contig edges.
booleanSpecifies the translation table used for gene annotation.
integer11Forces Pyrodigal to scan for motifs.
booleanThis forces Pyrodigal to append asterisks (*) as stop codon indicators. Do not use when running AMP workflow.
booleanFunctionally annotates all annotated coding regions.
Activates the functional annotation of annotated coding regions to provide more information about the codon regions classified.
booleanSpecifies the tool used for further protein annotation.
stringChange the database version used for annotation.
stringhttps://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/interproscan-5.72-103.0-64-bit.tar.gzPath to pre-downloaded InterProScan database.
stringAssigns the database(s) to be used to annotate the coding regions.
stringPANTHER,ProSiteProfiles,ProSitePatterns,Pfam^\w+(,\w+)*Pre-calculates residue mutual matches.
booleanGeneral options for database downloading
Specify whether to save pipeline-downloaded databases in your results directory.
booleanAntimicrobial Peptide detection using a deep learning model. More info: https://github.com/bcgsc/AMPlify
Skip AMPlify during AMP screening.
booleanAntimicrobial Peptide detection using machine learning. ampir uses a supervised statistical machine learning approach to predict AMPs. It incorporates two support vector machine classification models, ‘precursor’ and ‘mature’ that have been trained on publicly available antimicrobial peptide data. More info: https://github.com/Legana/ampir
Skip ampir during AMP screening.
booleanSpecify which machine learning classification model to use.
stringSpecify minimum protein length for prediction calculation.
integer10Antimicrobial Peptide detection based on predefined HMM models. This tool implements methods using probabilistic models called profile hidden Markov models (profile HMMs) to search against a sequence database. More info: http://eddylab.org/software/hmmer/Userguide.pdf
Run hmmsearch during AMP screening.
booleanSpecify path to the AMP hmm model file(s) to search against. Must have quotes if wildcard used.
stringSaves a multiple alignment of all significant hits to a file.
booleanSave a simple tabular file summarising the per-target output.
booleanSave a simple tabular file summarising the per-domain output.
booleanAntimicrobial peptide detection from metagenomes. More info: https://github.com/BigDataBiology/macrel
Skip Macrel during AMP screening.
booleanAntimicrobial peptides parsing, filtering, and annotating submodule of AMPcombi2. More info: https://github.com/Darcy220606/AMPcombi
The name of the database used to classify the AMPs.
stringThe path to the folder containing the reference database files.
stringSpecifies the prediction tools’ cut-offs.
number0.6Filter out all amino acid fragments shorter than this number.
integer120Remove all DRAMP annotations that have an e-value greater than this value.
number5Retain HMM hits that have an e-value lower than this.
number0.06Assign the number of codons used to look for stop codons, upstream and downstream of the AMP hit.
integer60Assign the number of CDSs upstream and downstream of the AMP to look for a transport protein.
integer11Remove hits that have no stop codon upstream and downstream of the AMP.
booleanAssigns the file extension used to identify AMPIR output.
string.ampir.tsvAssigns the file extension used to identify AMPLIFY output.
string.amplify.tsvAssigns the file extension used to identify MACREL output.
string.macrel.predictionAssigns the file extension used to identify HMMER/HMMSEARCH output.
string.hmmer_hmmsearch.txtClusters the AMP candidates identified with AMPcombi. More info: https://github.com/Darcy220606/AMPcombi
MMseqs2 coverage mode.
numberRemove hits that have no stop codon upstream and downstream of the AMP.
number4Remove clusters that don’t have more AMP hits than this number.
integerMMseqs2 clustering mode.
number1MMseqs2 alignment coverage.
number0.8MMseqs2 sequence identity.
number0.4Remove any hits that form a single member cluster.
booleanAntimicrobial resistance gene detection based on NCBI’s curated Reference Gene Database and curated collection of Hidden Markov Models. identifies AMR genes, resistance-associated point mutations, and select other classes of genes using protein annotations and/or assembled nucleotide sequences. More info: https://github.com/ncbi/amr/wiki
Skip AMRFinderPlus during the ARG screening.
booleanSpecify the path to a local version of the ARMFinderPlus database.
stringMinimum percent identity to reference sequence.
number-1Minimum coverage of the reference protein.
number0.5Specify which NCBI genetic code to use for translated BLAST.
integer11Add the plus genes to the report.
booleanAdd identified column to AMRFinderPlus output.
booleanAntimicrobial resistance gene detection using a deep learning model. DeepARG is composed of two models for two types of input: short sequence reads and gene-like sequences. In this pipeline we use the ls model, which is suitable for annotating full sequence genes and to discover novel antibiotic resistance genes from assembled samples. The tool Diamond is used as an aligner. More info: https://bitbucket.org/gusphdproj/deeparg-ss/src/master
Skip DeepARG during the ARG screening.
booleanSpecify the path to the DeepARG database.
stringSpecify the numeric version number of a user supplied DeepaRG database.
integer2Specify which model to use (short or long sequences).
stringSpecify minimum probability cutoff under which hits are discarded.
number0.8Specify E-value cutoff under which hits are discarded.
number1e-10Specify percent identity cutoff for sequence alignment under which hits are discarded.
integer50Specify alignment read overlap.
number0.8Specify minimum number of alignments per entry for DIAMOND step of DeepARG.
integer1000Antimicrobial resistance gene detection using a deep learning model. The tool includes developed and optimised models for a number or resistance gene types, and the functionality to create and optimize models of your own choice of resistance genes. More info: https://github.com/fannyhb/fargene
Skip fARGene during the ARG screening.
booleanSpecify comma-separated list of which pre-defined HMM models to screen against
stringclass_a,class_b_1_2,class_b_3,class_c,class_d_1,class_d_2,qnr,tet_efflux,tet_rpg,tet_enzymeSpecify to save intermediate temporary files to results directory.
booleanThe threshold score for a sequence to be classified as a (almost) complete gene.
numberThe minimum length of a predicted ORF retrieved from annotating the nucleotide sequences.
integer90Defines which ORF finding algorithm to use.
booleanThe translation table/format to use for sequence annotation.
stringpearsonAntimicrobial resistance gene detection, based on alignment to the CARD database based on homology and SNP models. More info: https://github.com/arpcard/rgi
Skip RGI during the ARG screening.
booleanPath to user-defined local CARD database.
stringSave RGI output .json file.
booleanSpecify to save intermediate temporary files in the results directory.
booleanSpecify the alignment tool to be used.
stringInclude all of loose, strict and perfect hits (i.e. ≥ 95% identity) found by RGI.
booleanSuppresses the default behaviour of RGI with --arg_rgi_includeloose.
booleanInclude screening of low quality contigs for partial genes.
booleanSpecify a more specific data-type of input (e.g. plasmid, chromosome).
stringRun multiple prodigal jobs simultaneously for contigs in a fasta file.
booleantrueAntimicrobial resistance gene detection based on alignment to CBI, CARD, ARG-ANNOT, ResFinder, MEGARES, EcOH, PlasmidFinder, Ecoli_VF and VFDB. More info: https://github.com/tseemann/abricate
Skip ABRicate during the ARG screening.
booleanSpecify the name of the ABRicate database to use. Names of non-default databases can be supplied if --arg_abricate_db provided.
stringncbiPath to user-defined local ABRicate database directory for using custom databases.
stringMinimum percent identity of alignment required for a hit to be considered.
integer80Minimum percent coverage of alignment required for a hit to be considered.
integer80Influences parameters required for the ARG summary by hAMRonization.
Specifies summary output format.
stringInfluences parameters required for the normalization of ARG annotations by argNorm. More info: https://github.com/BigDataBiology/argNorm
Skip argNorm during ARG screening.
booleanThese parameters influence general BGC settings like minimum input sequence length.
Specify the minimum length of contigs that go into BGC screening.
integer3000Specify to save the length-filtered (unannotated) FASTAs used for BGC screening.
booleanBiosynthetic gene cluster detection. More info: https://docs.antismash.secondarymetabolites.org
Skip antiSMASH during the BGC screening.
booleanPath to user-defined local antiSMASH database.
stringMinimum length a contig must have to be screened with antiSMASH.
integer3000Turn on clusterblast comparison against database of antiSMASH-predicted clusters.
booleanTurn on clusterblast comparison against known gene clusters from the MIBiG database.
booleanTurn on clusterblast comparison against known subclusters responsible for synthesising precursors.
booleanTurn on ClusterCompare comparison against known gene clusters from the MIBiG database.
booleanGenerate phylogenetic trees of secondary metabolite group orthologs.
booleanDefines which level of strictness to use for HMM-based cluster detection.
stringRun Pfam to Gene Ontology mapping module.
booleanRun RREFinder precision mode on all RiPP gene clusters.
booleanSpecify which taxonomic classification of input sequence to use.
stringRun TFBS finder on all gene clusters.
booleanA deep learning genome-mining strategy for biosynthetic gene cluster prediction. More info: https://github.com/Merck/deepbgc/tree/master/deepbgc
Skip DeepBGC during the BGC screening.
booleanPath to local DeepBGC database folder.
stringAverage protein-wise DeepBGC score threshold for extracting BGC regions from Pfam sequences.
number0.5Run DeepBGC’s internal Prodigal step in single mode to restrict detecting genes to long contigs
booleanMerge detected BGCs within given number of proteins.
integerMerge detected BGCs within given number of nucleotides.
integerMinimum BGC nucleotide length.
integer1Minimum number of proteins in a BGC.
integer1Minimum number of protein domains in a BGC.
integer1Minimum number of known biosynthetic (as defined by antiSMASH) protein domains in a BGC.
integerDeepBGC classification score threshold for assigning classes to BGCs.
number0.5Biosynthetic gene cluster detection using Conditional Random Fields (CRFs). More info: https://gecco.embl.de
Skip GECCO during the BGC screening.
booleanEnable unknown region masking to prevent genes from stretching across unknown nucleotides.
booleanThe minimum number of coding sequences a valid cluster must contain.
integer3The p-value cutoff for protein domains to be included.
number1e-9The probability threshold for cluster detection.
number0.8The minimum number of annotated genes that must separate a cluster from the edge.
integerBiosynthetic Gene Cluster detection based on predefined HMM models. This tool implements methods using probabilistic models called profile hidden Markov models (profile HMMs) to search against a sequence database. More info: http://eddylab.org/software/hmmer/Userguide.pdf
Run hmmsearch during BGC screening.
booleanSpecify path to the BGC hmm model file(s) to search against. Must have quotes if wildcard used.
stringSaves a multiple alignment of all significant hits to a file.
booleanSave a simple tabular file summarising the per-target output.
booleanSave a simple tabular file summarising the per-domain output.
booleanParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringLess common options for the pipeline, typically set in a config file.
Display version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringCustom config file to supply to MultiQC.
stringCustom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
stringCustom MultiQC yaml file containing HTML including a methods description.
stringBoolean whether to validate parameters against the schema at runtime
booleantrueBase URL or local path to location of pipeline test dataset files
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
stringDisplay the help message.
boolean,stringDisplay the full detailed help message.
booleanDisplay hidden parameters in the help message (only works when —help or —help_full are provided).
boolean