*This archived content is for the previous version of the HPC operating. Please see 2018 [Bioinformatics] page for the latest information on currently supported software*
This tutorial has been modified slightly from one originally provided as part of the ANGUS bioinformatics course. Modifications have been made to accommodate following the tutorial on the HPCC instead of Amazon EC2.
Checking read quality with FastQC
When you get your sequences back from a sequencing facility, it’s important to check that they are high quality (garbage in, garbage out). In this tutorial, we’ll use software called FastQC which checks whether a set of sequence reads in a .fastq file exhibit any unusual qualities (which might indicate either low sequence quality, or interesting biological features in your sample).
Getting the data
The data used in this tutorial has already been preselected and downloaded for your convenience. It is located in the HPCC directory:
Simply copy the following files over to your working directory. First, a "good" sequence in fastq format:
Then a "bad" one:
To run FastQC on the HPCC in interactive mode, you will need to establish an X-connection over SSH. On workstations using the Mac or Linux operating system, simply open a terminal and enter:
For Windows users, you will need PuTTY and Xming or Cygwin-X to establish an X-connection over SSH. You can follow these instructions for Xming, or stop by the HPCC and pick-up a preloaded thumb drive with the software you need.
Once you are connected to Gateway with an X-session, you will need to login to one of the dev-nodes before running FastQC:
Now, simply load the module file for FastQC (remember to do this on a dev-node):
There are two ways in which FastQC can be run: in "command line" mode, or as a GUI (graphical user interface). This tutorial addresses the command line version of FastQC. Let's start by analyzing our "good" file:
This will generate a self-contained directory called "good_sequence_short_fastqc" which contains an HTML formatted report that can be loaded into a browser. If we change into that directory and list the contents of the file "summary.txt" we can see which tests passed and which failed:
If we were to open the file "fastqc_report.html" in a browser, we would see:
The image above presents only a small portion of the output you receive from FastQC. This has been provided only for demonstration purposes. Please scroll down through your FastQC results to see other useful charts and tables, or click on the links in the lefthand pane.
Now we can repeat this procedure using our file of "bad" sequences:
Running FastQC in GUI Mode
If you want to run FastQC in GUI mode, logon to the HPCC using an X-windows session, load the module file and start FastQC as follows:
A video has been prepared by the FastQC developers which illustrates how to use this application in GUI mode.