bioinformatics banner


                                                  ** TomatoFarmer: September 2013 **

TomatoFarmer controls an exome analysis from start to finish. It creates alignment jobs for each of the samples in your directory, waits for all jobs to finish and then launches metrics and variant calling jobs. Jobs will be resubmitted up to a set number of times to combat spurious CHPC erors. Job directories are left behind so you can save your log files.

Required Arguments:

-d Job directory. This directory must be a subdirectory of /tomato/version/job. Can can be several directory levels below /tomato/version/job. Example: '-d /tomato/version/job/krustofsky/demo'.
-e Email address. TomatoFarmer emails you once the job completes/fails. You can also opt to get all tomato emails as individual jobs start/end (see option -x). Example: '-e'.
-y Analysis pipeline. The analysis pipeline or step to run. Current options are:

  1. exome_bwa - Full exome analysis (bwa).
  2. exome_novoalign - Full exome analysis (novoalign).
  3. exome_align_bwa - Alignment/recalibration only (bwa).
  4. exome_align_novoalign - Align/recalibration only (novoalign).
  5. exome_metrics - Sample QC metrics only.
  6. exome_variant_raw - Variant detection and filtering (raw settings).
  7. path to configuration file. This allows you to use older versions of the pipeline.

If you want run older versions of the pipeline, you must provide a configuration file, otherwise the most recent versions of each command are used. Example: '-y exome_bwa'.

Optional Arguments:

-t Target regions. Setting this argument will restrict coverage metrics and variant detection to targeted regions. This speeds up the variation detection process and reduces noise. Options are:

  1. agilent
  2. nimblegen
  3. truseq
  4. path to custom targed bed file.

If nothing is specifed for this argument, the full genome will be queried for variants and ccds exomes will be used for capture metrics. Example: '-t truseq'.

-w Wall time. Use this option followed by a new wall time, in hours, if you want less wall time than the default 240 hours. Useful when there is upcoming CHPC downtime. Example: '-w 40'.
-s Study name. Set this if you want your VCF files to have a prefix other than 'STUDY'. Example: '-s DEMO'.
-c Split chromsomes. Set this option if you want to run variant calling on each chromosome separately. This is good for large projects (>15 exomes). They will be merged once they are all finished. Example: '-c'.
-x Unsuppress tomato emails. Receive both tomato and TomatoFarmer emails. Example: '-x'.

Admin Arguments:

-f Manually set failure level. If this variable is set, the user will be prompted to override default allowed failures for each analysis step before the analysis begins. If exome_align is set to 3, once four failures are reached across all spawned threads, everything will be shut down. You might think about changing the default settings if you're running lots of samples. Note that outright tomato failures (job never actually starts) don't count against the cap. Must be between 1 and 20.
-l Logging level. Level of logging you want to see. Options INFO, WARNING, ERROR. ERROR just displays error messages, WARNING displays warning and error messages. INFO shows all three levels. Default: INFO.
-b Heatbeat frequency. How often you want a thread heartbeat message in minutes. Default: 30 mins.
-j Number of jobs at a time. Don't abuse this, big brother is watching you. Default: 5 jobs.

Example: java -Xmx4G -jar pathTo/USeq/Apps/TomatoFarmer -d /tomato/version/job/demo/ -e -y exome_bwa -s DEMO -c -t agilent **************************************************************************************