shared resources banner

High-Throughput Genomics and Bioinformatic Analysis

Director - Genomics
Brian K. Dalley, PhD

Director - Bioinformatic Analysis
David Nix, PhD

A message to our users: Huntsman Cancer Institute's genomic data repository, GNomEX, is growing exponentially. We have recently purchased a cloud-based genomics data storage and bioinformatic analysis platform called Seven Bridges Genomics. More information about this platform and what this means for our users is outlined in these documents:

View Website

The High-Throughput Genomics and Bioinformatic Analysis Shared Resource provides investigators at HCI and the University of Utah with access to Illumina high-throughput sequencing (HiSeq and MiSeq) and to the Affymetrix and Agilent Technologies microarrays platforms.  The Resource is operated as a full-service facility that provides expertise in experimental handling of all aspects of these technologies.  Services offered by the Resource include evaluation of nucleic acid sample quality (NanoDrop, Qubit Fluorometer assays, Agilent 2200 TapeStation); construction of Illumina sequencing libraries (genomic DNA, RNA, stranded RNA, small RNA, ChIP DNA, Mate Pair, Exome Enrichment, Targeted Custom Enrichment, whole genome and targeted enrichment bisulfite libraries; 16S amplicon library preparation of metagenomic samples; and single cell RNA analysis); sequencing of libraries on an Illumina HiSeq or MiSeq; labeling of microarray samples with modified nucleotides; microarray hybridization; microarray scanning; and feature extraction and annotation of data from scanned images. 

The technologies offered through the High-Throughput Genomics and Bioinformatic Analysis Shared Resource provide investigators with a diverse set of tools to measure gene, exon, or miRNA expression; SNP and DNA copy number profiling, location analysis of DNA binding proteins, DNA methylation status, exome and whole genome sequencing and the enrichment and sequencing of targeted regions of the genome. Instrumentation to support these applications include the following: two Illumina HiSeq instruments, one Illumina c-Bot, one Illumina MiSeq, a Fluidigm C1 Single-Cell Auto Prep System, an Agilent Technologies Microarray Scanner and two Microarray Hybridization Ovens, an Affymetrix Hybridization Oven 640 and a Fluidics Station 450, one Covaris Adaptive Focused Acoustics Model S2, one NanoDrop Spectrophotometer, one Bio-Rad CFX Connect Real Time System, seven Bio-Rad thermal cyclers, one Agilent Bravo Liquid Handling System, one Agilent Technologies 2200 TapeStation and two Invitrogen Qubit 2.0 Fluorometers.

The Huntsman Cancer Institute and University of Utah maintains a comprehensive Bioinformatics Shared Resource (BSR) consisting of five highly qualified data analyst programmer scientists who are available to assist campus researchers in all aspects of genomic analysis.

Bioinformatic Analysis Website

Major services for investigators:

  • Assistance in writing grants and planning experiments
  • Primary analysis of genomic data (e.g., raw intensities -> sequence alignments -> gene lists/enriched region/ variant identification)
  • Secondary analysis of primary data (e.g., cross data integration and comparison), identification of biological significance (e.g. IPA/KEGG/GO-Term/GSEA pathway significance, hierarchical/k-means clustering, variant annotation w/ ANNOVAR and VAAST)
  • Data organization and data distribution (e.g., GnomEx and IGB/IGV/UCSC genome browsers)
  • Uploading datasets to public repositories (e.g. GEO and the SRA)
  • Writing and reviewing manuscripts

Major infrastructure responsibilities:

  • Writing tutorials, teaching lecture classes in University genomics courses
  • Computational infrastructure management
  • Custom software development (see below)
  • Benchmarking of new tools and technologies
  • Processing, archiving, and distributing all of the demultiplexed, quality-controlled, fastq data from the HiSeq sequencers

Custom Software Packages Developed by the BSR:

  • USeq: A high-throughput sequencing analysis package for processing ChIP-Seq, RNA-Seq, Bis-Seq, RIP-Seq, and exon/ whole genome variant discovery datasets. Java, 131+ command line applications, open source:, Nix et al., BMC Bioinformatics 2008.
  • Pysano: A computing management framework for executing predefined data analysis pipelines on high-performance computer clusters. Integrated with GNomEx for automated analysis. Enables users to perform sophisticated and computationally intensive analysis in a simplified framework. Pysano is written in Python, and is open source: python.
  • GNomEx: A major laboratory information management system, analysis project center, and programmatic data distribution service. Java and Flex/Flash, open source:,, Nix et al., BMC Bioinformatics 2010. This represents a major continuing collaboration between the BSR and the Research Informatics Shared Resources.

Custom Pipelines (integrated into Pysano) Utilizing Open Source Command Line Software:

  • R and BioConductor: An extensive collection of statistical analysis tools
  • DESeq, DEXSeq, edgeR BioConductor packages for identifying differentially expressed genes/regions in RNA-Seq and ChIP-Seq datasets
  • GATK: The Broad Institute’s genome analysis tool kit for variant calling
  • Picard and SAMtools: Next-generation SAM file packages and data quality tools
  • VAAST, pVAAST, PHEVOR: DNA variant disease association classification package created by the Human Genetics Department
  • ANNOVAR DNA variant annotation package

Hosted Commercial Software:

  • Partek Genomics Suit: GUI-driven gene expression and clustering package
  • Ingenuity Pathway Analysis: GUI based pathway enrichment analysis tool
  • GeneSifter: Microarray processing package for gene expression analysis

Hardware: The BSR manages several large compute resources with the assistance of the HCI Computing and Technology Group and the U of U Center for High Performance Computing. These include a 19 node/ 204 CPU compute cluster and 5 large linux severs (48 CPU, 179GB). The latter are made available to BSR users for stand-alone command line analysis. The core also manages a variety of web and file serving services. 20TB of fast analysis disk and 168TB of storage disk are used by the BSR to host user data and all of the output of the HiSeq machines under a RAID5 with hot spare and tape backup configuration.

Operations Policies: The BSR is available to all U of U researchers at the same cost, with no preference in priority or pricing. Analysis requests are entered, assigned, and monitored using an e-mail-based request tracker ( ). A nominal fee of $40/hr is charged for all services up to 20hrs/month, and then $60/hr. This fee recovers ~25 percent of the Bioinformatics section budget. Inclusion of BSR staff on extramural grants as a percent FTE covers another ~10 percent of section costs.  This is permitted on a limited basis, and allows the principal investigator to bypass the request queue. Wait times on beginning analysis requests are typically 1-3 days and can take hours to months of effort to complete depending on the availability of appropriate software tools and complexity of the analysis.

High-Throughput Genomics and Bioinformatic Analysis Governance

HCI Senior Director Oversight
Bradley Cairns, PhD

Faculty Advisory Committee Chair
Bradley Cairns, PhD

Faculty Advisory Committee
Richard Clark, PhD
Jason Gertz, PhD
Christopher Gregg, PhD
Philip Moos, PhD
Sean Tavtigian, PhD
Katherine Varley, PhD
Joseph Yost, PhD

If use of this resource results in a publication, please acknowledge the Cancer Center Support Grant by using the following text: "The project described was supported by Award Number P30CA042014 from the National Cancer Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health."