nf-core/sarek
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
3.5.0). The latest
stable release is
3.6.0
.
See the advisory entry for more information.
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string^\S+\.csv$Automatic retrieval for restart
string^\S+\.csv$Starting step
stringThe output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringMost common options used for the pipeline
Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all.
integer50000000Estimate interval size.
integer200000Path to target bed file in case of whole exome or targeted sequencing or intervals file.
stringDisable usage of intervals.
booleanEnable when exome or panel data is provided.
booleanTools to use for duplicate marking, variant calling and/or for annotation.
stringDisable specified tools.
stringTrim fastq file or handle UMIs
Run FastP for read trimming
booleanRemove bp from the 5’ end of read 1
integerRemove bp from the 5’ end of read 2
integerRemove bp from the 3’ end of read 1
integerRemove bp from the 3’ end of read 2
integerRemoving poly-G tails.
integerMinimum length of reads to keep
integer15Save trimmed FastQ file intermediates.
booleanSpecify UMI read structure
stringDefault strategy with UMI
stringAdjacencyIf set, publishes split FASTQ files. Intended for testing purposes.
booleanConfigure preprocessing tools
Specify aligner to be used to map reads to reference genome.
stringSave mapped files.
booleanSaves output from mapping (if --save_mapped), Markduplicates & Baserecalibration as BAM file instead of CRAM
booleanEnable usage of GATK Spark implementation for duplicate marking and/or base quality score recalibration
stringConfigure variant calling tools
If true, skips germline variant calling for matched normal to tumor sample. Normal samples without matched tumor will still be processed through germline variant calling tools.
booleanOverwrite Ascat min base quality required for a read to be counted.
integer20Overwrite Ascat minimum depth required in the normal for a SNP to be considered.
integer10Overwrite Ascat min mapping quality required for a read to be counted.
integer35Overwrite ASCAT ploidy.
numberOverwrite ASCAT purity.
numberSpecify a custom chromosome length file.
stringOverwrite Control-FREEC coefficientOfVariation
number0.05Overwrite Control-FREEC contaminationAdjustement
booleanDesign known contamination value for Control-FREEC
integerMinimal sequencing quality for a position to be considered in BAF analysis.
integerMinimal read coverage for a position to be considered in BAF analysis.
integerGenome ploidy used by ControlFREEC
string2Overwrite Control-FREEC window size.
numberCopy-number reference for CNVkit
stringTurn on the joint germline variant calling for GATK haplotypecaller
booleanRuns Mutect2 in joint (multi-sample) mode for better concordance among variant calls of tumor samples from the same patient. Mutect2 outputs will be stored in a subfolder named with patient ID under variant_calling/mutect2/ folder. Only a single normal sample per patient is allowed. Tumor-only mode is also supported.
booleanDo not analyze soft clipped bases in the reads for GATK Mutect2.
booleanPanel-of-normals VCF (bgzipped) for GATK Mutect2
stringIndex of PON panel-of-normals VCF.
stringOption for selecting output and emit-mode of Sentieon’s Haplotyper.
stringvariantOption for selecting output and emit-mode of Sentieon’s Dnascope.
stringvariantOption for selecting the PCR indel model used by Sentieon Dnascope.
stringCONSERVATIVEOption for concatenating germline vcf-files.
booleanAllow usage of fasta file for annotation with VEP
booleanEnable the use of the VEP dbNSFP plugin.
booleanPath to dbNSFP processed file.
stringPath to dbNSFP tabix indexed file.
stringConsequence to annotate with
stringFields to annotate with
stringrs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AFEnable the use of the VEP LOFTEE plugin.
booleanEnable the use of the VEP SpliceAI plugin.
booleanPath to spliceai raw scores snv file.
stringPath to spliceai raw scores snv tabix indexed file.
stringPath to spliceai raw scores indel file.
stringPath to spliceai raw scores indel tabix indexed file.
stringEnable the use of the VEP SpliceRegion plugin.
booleanAdd an extra custom argument to VEP.
string--everything --filter_common --per_gene --total_length --offline --format vcfShould reflect the VEP version used in the container.
string111.0-0The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringVEP output-file format.
stringA vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.
stringIndex file for bcftools_annotations
stringText file with the header lines of bcftools_annotations
stringGeneral options to interact with reference genomes.
The base path to the igenomes reference files
strings3://ngi-igenomes/igenomes/Do not load the iGenomes reference config.
booleanSave built references.
booleanOnly built references.
booleanDownload annotation cache.
booleanReference genome related files and options required for the workflow. If you use AWS iGenomes, this has already been set for you appropriately.
Name of iGenomes reference.
stringGATK.GRCh38ASCAT genome.
stringPath to ASCAT allele zip file.
stringPath to ASCAT loci zip file.
stringPath to ASCAT GC content correction file.
stringPath to ASCAT RT (replictiming) correction file.
stringPath to BWA mem indices.
stringPath to bwa-mem2 mem indices.
stringPath to chromosomes folder used with ControLFREEC.
stringPath to dbsnp file.
stringPath to dbsnp index.
stringLabel string for VariantRecalibration (haplotypecaller joint variant calling).
If you use AWS iGenomes, this has already been set for you appropriately.
stringPath to FASTA dictionary file.
stringPath to dragmap indices.
stringPath to FASTA genome file.
string^\S+\.fn?a(sta)?(\.gz)?$Path to FASTA reference index.
stringPath to GATK Mutect2 Germline Resource File.
stringPath to GATK Mutect2 Germline Resource Index.
stringPath to known indels file.
stringPath to known indels file index.
stringLabel string for VariantRecalibration (haplotypecaller joint variant calling). If you use AWS iGenomes, this has already been set for you appropriately.
stringPath to known snps file.
stringPath to known snps file snps.
stringLabel string for VariantRecalibration (haplotypecaller joint variant calling).If you use AWS iGenomes, this has already been set for you appropriately.
stringPath to Control-FREEC mappability file.
stringPath to SNP bed file for sample checking with NGSCheckMate
stringMachine learning model for Sentieon Dnascope.
stringPath to snpEff cache.
strings3://annotation-cache/snpeff_cache/snpEff DB version.
stringPath to VEP cache.
strings3://annotation-cache/vep_cache/VEP cache version.
stringVEP genome.
stringVEP species.
stringParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringBase path / URL for data used in the test profiles
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/sarek3Base path / URL for data used in the modules
stringSequencing center information to be added to read group (CN field).
stringSequencing platform information to be added to read group (PL field).
stringILLUMINALess common options for the pipeline, typically set in a config file.
Display version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Email address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringMultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringCustom config file to supply to MultiQC.
stringCustom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
stringCustom MultiQC yaml file containing HTML including a methods description.
stringBoolean whether to validate parameters against the schema at runtime
booleantrueBase URL or local path to location of pipeline test dataset files
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/