bioinformatics banner
slider

Guide for Pipeline Builder

Customizable and versioned pipelines

This feature was introduced to replicate analysis result at any time no matter how tomato updated its underlying applications and library data files. If you use same version of pipeline, the results are the same.

All current pipelines are defined in files under /tomato/version/conf/1.txt. The lines that start with “@” are the pipelines that you can use in your “cmd.txt”.

For example,

$ll /tomato/version/conf/
1.txt
v1

Here we only 1 configuration file (except for “v1”), which was called “version configuration file” with file name format “VERSION_NUMBER.txt”. Now when you look at the 1.txt, you will find the following structure:

#########################
#Novoalign
#########################
1:novoalign:/tomato/version/app/novoalign/2.08.01/::COMMENT:-r None -k ... 42:GenomeAnalysisTK.jar -T VariantRecalibrator:/tomato/version/app/gatk/2.3-0/:hg19:COMMENT:-mode BOTH

Here, each application is defined by this format:


Index : AppName : AppPath : ApplicableGenome: Comment: DefaultAppParameters

Example:

1:novoalign:/tomato/version/app/novoalign/2.08.01/::COMMENT:-r None -k

Can be explained as

1 [index of this application is 1]
novoalign [name of this application]
/tomato/version/app/novoalign/2.08.01/ [the full path to this application]
EMPTY [this application can be used on any reference genome if this field is empty] COMMENT [comment to this application]
-r None -k [the default parameter for running this application]


Example:

42:GenomeAnalysisTK.jar -T VariantRecalibrator:/tomato/version/app/gatk/2.3-0/
:hg19,mm10:COMMENT:-mode BOTH

Can be explained as

42  [index of this application]
GenomeAnalysisTK.jar -T VariantRecalibrator [name of this application]
/tomato/version/app/gatk/2.3-0/ [the full path to this application]
hg19,mm10 [this application can ONLY be used on hg19 and mm10 (specified by
"-g hg19" or "-g mm10" in the cmd.txt)]
COMMENT [comment to this application]
-mode BOTH   [the default parameter for running this application]


To make new pipelines, you have two ways to go.

1. Add new applications and add new pipelines in the "1.txt"

#For example, you want to use "-r ALL" in novoalign #First, add a new application

102:novoalign:/tomato/version/app/novoalign/2.08.01/::COMMENT:-r ALL -k

#Second, add a new pipeline

@myalignALL:102

2. Create a new configuration file and make changes in your new configuration file, and use "#v VERSION_NUMBER" in the "cmd.txt"

#For example, you want to use make a new configuration file named "5.txt" (MUST be a NUMBER.txt) $cp 1.txt 5.txt
#Make new pipelines in 5.txt
@hello:2,3,4
@world:1,3,5
@byebye:2,4,9,11
#use your new pipelines in new jobs 
$cat cmd.txt

#e u01234@utah.edu
#v 5
@hello -g hg19 -i *.gz

So “1.txt” defines the version-1 pipelines, “3.txt” defines version-3 pipelines, etc.

“v1” means the “current pipeline version is 1”. If you do not specify the version in your “cmd.txt”, this is the default version to be used.

The settings of DefaultAppParameters

The last field in an application line is the DefaultAppParameters. All parameters that involve I/O (e.g. "-I=INPUT -o=OUTPUT" in Picard) and

library file (e.g. "-R hg19.fasta" in GATK)were managed by Tomato. Therefore, you can NOT specify these parameters.

Besides, you can NOT use any of the following three words : <, > and | in anywhere of your parameters.

In another word, you can NOT redirect input/output or use pipes. The only exception is that using a quoted parameter like [--filter_name "AB<0.5 || QD>100.0"] which

was used in "VariantFiltration" and other applications. This is to prevent you from changing names of input and output files in pipeline

because every application in a pipeline use previous application's output as its input. Doing so will get an exception and fail your job.