bioinformatics banner
slider

Pysano tutorials

Here are a few tutorials to introduce you to the basics of using Pysano. Each tutorial includes a zip file, which you can download and unzip in your job directory.

Tutorial 1: Hello, world!

In this tutorial you will create and run a simple pysano job.

1. Download the tutorial file pysano_tutorial1.zip to your home directory on one of the HCI Linux servers (e.g. hci-moab, hci-alta, or hci-uinta). The easiest way to do this is using the "wget" command. Copy the link to the tutorial, and paste it onto the command line after the wget command:

$ wget http://healthcare.utah.edu/.../pysano_tutorial1.zip

Downloading the tutorial file should take just a second or two. Once the download is done, unzip the tutorial file. This will create the directory Tutorial1, which will contain the script cmd.txt.

2. Edit the cmd.txt file so your email appears on the first line. After editing the file cmd.txt should look like this:

#e your_email_address
echo "Hello, world!"
sleep 10
date
sleep 10
pwd
sleep 10
hostname

There are several text editors available on Linux, but the simplest is nano.

3. Execute the "pstart" command to notify pysano about your job. You need to tell pstart which directory contains your pysano job files. If your current directory is the Tutorial1 directory, you can simply execute:

$ pstart .

("." is Linux shorthand for the current directory.) You could also give pstart the full or relative path of your job directory. Once notified about your job, pysano will send you an email that your job has been accepted. The email will include the URL of a web page where you can monitor the progress of your job. Pysano will then select a computer cluster for your job, transfer your files to the cluster, and execute the job. Once your job has been processed and the result files returned to you, you will receive another email that the job is complete. At that point your job directory should contain the following:

  • cmd.txt - the original pysano script
  • pbs.sh - a script created by pysano, which shows the commands that were run on the cluster
  • stderr.txt - any error messages from your job (which should be empty)
  • stdout.txt - the output from your job

Here is a typical result for this job in the stdout.txt file:

Hello, world!
Thu Feb 19 12:55:50 MST 2015
/scratch/local/bmilash_3438_46
em062
success

Tutorial 2: Alignment

In this tutorial you'll do some actual bioinformatics: aligning FASTQ data to a genomic index.

1. Download the tutorial file pysano_tutorial2.zip into your home directory on one of the HCI Linux servers (e.g. hci-moab, hci-alta, or hci-uinta), unzip it, and change to the new Tutorial2 directory:

$ cd
$ wget http://healthcare.utah.edu/.../pysano_tutorial2.zip
$ unzip pysano_tutorial2.zip
$ cd Tutorial2

The Tutorial2 directory should contain the following:

  • cmd.txt - the pysano script
  • SRR202898_sample.txt.gz - a small sample (100,000 sequences) of quality-trimmed data from the Sequence Read Archive.

2. Edit the cmd.txt script so that your email address appears on the first line. After editing it should look like this:

#e your_email_address
#c ember
fastqc *.gz --noextract
@align -novoalign [-o SAM -r All 50] -g mm10 -i *.gz -gzip

Notice the "#c ember" statement. This directive instructs pysano to send your job to the ember cluster. Without this statement, pysano would choose the least busy cluster for your job. The available clusters are listed here.

3. Execute the "pstart" command to notify pysano about your job. You need to tell pstart which directory contains your pysano job files. If your current directory is the Tutorial2 directory, you can simply execute:

$ pstart .

("." is Linux shorthand for the current directory.) You could also give pstart the full or relative path of your job directory.

Once your job has been processed and the result files returned to you, you will receive an email that the job is complete. At that point your job directory should contain the following:

  • cmd.txt - the original pysano script
  • pbs.sh - a script created by pysano, which shows the commands that were run on the cluster
  • stderr.txt - The standard error output from the alignment job. This contains some interesting alignment stats from the aligner.
  • stdout.txt - The standard output from the script
  • SRR202898_sample.sam.gz - a gzipped SAM format alignment file
  • SRR202898_sample.txt.gz - the original Fastq file.
  • SRR202898_sample_fastqc.zip - Zipped output from the Fastqc program