nf-core/sarek
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
3.4.2). The latest
stable release is
3.6.0
.
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string^\S+\.csv$Automatic retrieval for restart
string^\S+\.csv$Starting step
stringThe output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringMost common options used for the pipeline
Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all.
integer50000000Enable when exome or panel data is provided.
booleanPath to target bed file in case of whole exome or targeted sequencing or intervals file.
stringEstimate interval size.
integer200000Disable usage of intervals.
booleanTools to use for duplicate marking, variant calling and/or for annotation.
stringDisable specified tools.
stringTrim fastq file or handle UMIs
Run FastP for read trimming
booleanRemove bp from the 5’ end of read 1
integerRemove bp from the 5’ end of read 2
integerRemove bp from the 3’ end of read 1
integerRemove bp from the 3’ end of read 2
integerRemoving poly-G tails.
integerSave trimmed FastQ file intermediates.
booleanSpecify UMI read structure
stringDefault strategy with UMI
stringAdjacencyIf set, publishes split FASTQ files. Intended for testing purposes.
booleanConfigure preprocessing tools
Specify aligner to be used to map reads to reference genome.
stringSave mapped files.
booleanSaves output from mapping (if --save_mapped), Markduplicates & Baserecalibration as BAM file instead of CRAM
booleanEnable usage of GATK Spark implementation for duplicate marking and/or base quality score recalibration
stringConfigure variant calling tools
Option for concatenating germline vcf-files.
booleanIf true, skips germline variant calling for matched normal to tumor sample. Normal samples without matched tumor will still be processed through germline variant calling tools.
booleanTurn on the joint germline variant calling for GATK haplotypecaller
booleanRuns Mutect2 in joint (multi-sample) mode for better concordance among variant calls of tumor samples from the same patient. Mutect2 outputs will be stored in a subfolder named with patient ID under variant_calling/mutect2/ folder. Only a single normal sample per patient is allowed. Tumor-only mode is also supported.
booleanOverwrite Ascat min base quality required for a read to be counted.
integer20Overwrite Ascat minimum depth required in the normal for a SNP to be considered.
integer10Overwrite Ascat min mapping quality required for a read to be counted.
integer35Overwrite ASCAT ploidy.
numberOverwrite ASCAT purity.
numberSpecify a custom chromosome length file.
stringOverwrite Control-FREEC coefficientOfVariation
number0.05Overwrite Control-FREEC contaminationAdjustement
booleanDesign known contamination value for Control-FREEC
integerMinimal sequencing quality for a position to be considered in BAF analysis.
integerMinimal read coverage for a position to be considered in BAF analysis.
integerGenome ploidy used by ControlFREEC
string2Overwrite Control-FREEC window size.
numberCopy-number reference for CNVkit
stringDo not analyze soft clipped bases in the reads for GATK Mutect2.
booleanOption for selecting output and emit-mode of Sentieon’s Haplotyper.
stringvariantOption for selecting output and emit-mode of Sentieon’s Dnascope.
stringvariantOption for selecting the PCR indel model used by Sentieon Dnascope.
stringCONSERVATIVEAllow usage of fasta file for annotation with VEP
booleanEnable the use of the VEP dbNSFP plugin.
booleanPath to dbNSFP processed file.
stringPath to dbNSFP tabix indexed file.
stringConsequence to annotate with
stringFields to annotate with
stringrs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AFEnable the use of the VEP LOFTEE plugin.
booleanEnable the use of the VEP SpliceAI plugin.
booleanPath to spliceai raw scores snv file.
stringPath to spliceai raw scores snv tabix indexed file.
stringPath to spliceai raw scores indel file.
stringPath to spliceai raw scores indel tabix indexed file.
stringEnable the use of the VEP SpliceRegion plugin.
booleanAdd an extra custom argument to VEP.
string--everything --filter_common --per_gene --total_length --offline --format vcfShould reflect the VEP version used in the container.
string111.0-0The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringVEP output-file format.
stringA vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.
stringIndex file for bcftools_annotations
stringText file with the header lines of bcftools_annotations
stringReference genome related files and options required for the workflow.
Name of iGenomes reference.
stringGATK.GRCh38ASCAT genome.
stringPath to ASCAT allele zip file.
stringPath to ASCAT loci zip file.
stringPath to ASCAT GC content correction file.
stringPath to ASCAT RT (replictiming) correction file.
stringPath to BWA mem indices.
stringPath to bwa-mem2 mem indices.
stringPath to chromosomes folder used with ControLFREEC.
stringPath to dbsnp file.
stringPath to dbsnp index.
stringlabel string for VariantRecalibration (haplotypecaller joint variant calling)
stringPath to FASTA dictionary file.
stringPath to dragmap indices.
stringPath to FASTA genome file.
string^\S+\.fn?a(sta)?(\.gz)?$Path to FASTA reference index.
stringPath to GATK Mutect2 Germline Resource File.
stringPath to GATK Mutect2 Germline Resource Index.
stringPath to known indels file.
stringPath to known indels file index.
stringIf you use AWS iGenomes, this has already been set for you appropriately.
1st label string for VariantRecalibration (haplotypecaller joint variant calling)
stringIf you use AWS iGenomes, this has already been set for you appropriately.
Path to known snps file.
stringPath to known snps file snps.
stringIf you use AWS iGenomes, this has already been set for you appropriately.
label string for VariantRecalibration (haplotypecaller joint variant calling)
stringPath to Control-FREEC mappability file.
stringPath to SNP bed file for sample checking with NGSCheckMate
stringPanel-of-normals VCF (bgzipped) for GATK Mutect2
stringIndex of PON panel-of-normals VCF.
stringMachine learning model for Sentieon Dnascope.
stringsnpEff DB version.
stringsnpEff genome.
stringVEP genome.
stringVEP species.
stringVEP cache version.
stringSave built references.
booleanOnly built references.
booleanDownload annotation cache.
booleanDirectory / URL base for iGenomes references.
strings3://ngi-igenomes/igenomes/Do not load the iGenomes reference config.
booleanPath to VEP cache.
strings3://annotation-cache/vep_cache/Path to snpEff cache.
strings3://annotation-cache/snpeff_cache/Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringBase path / URL for data used in the test profiles
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/sarek3Base path / URL for data used in the modules
stringSequencing center information to be added to read group (CN field).
stringSequencing platform information to be added to read group (PL field).
stringILLUMINASet the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer16Maximum amount of memory that can be requested for any single job.
string128.GB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Maximum amount of time that can be requested for any single job.
string240.h^(\d+\.?\s*(s|m|h|d|day)\s*)+$Less common options for the pipeline, typically set in a config file.
Display help text.
booleanDisplay version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Email address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanMultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringCustom config file to supply to MultiQC.
stringCustom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
stringCustom MultiQC yaml file containing HTML including a methods description.
stringBoolean whether to validate parameters against the schema at runtime
booleantrueShow all params when using --help
booleanValidation of parameters fails when an unrecognised parameter is found.
booleanValidation of parameters in lenient more.
booleanIncoming hook URL for messaging service
string