This section provides Bioinformatics users with helpful tutorials covering several different aspects of performing data analysis and handling on the HPCC systems not covered elsewhere within this wiki. For general systems information (including hardware, the basics of job submission, etc.) please refer to the HPCC User Documentation.
The ANGUS Tutorials section is based upon the "Analysis of Next-Generation Sequencing Data" course developed by Titus Brown. While the original set of ANGUS tutorials covered several forms of Bioinformatics analysis using the Amazon Cloud, this revised and augmented version recalibrates instructions for the HPCC system. Topics include BLAST, Bowtie, Samtools, matplotlib visualization, Meme and several other common informatics tools.
Specific examples related to running Bioinformatics tools on the HPCC, created by iCER staff.
- ABySS - using parallel and serial versions of the ABySS assembler
- Agalma - using the Agalma pipeline
- Aspera Bulk File Transfer - using the Aspera Connect tool to download large quantities of data from sites like NCBI
- Basic Digital Normalization - Titus Brown's data pre-processing tutorial using scripts from khmer and screed. Adapted for use on the HPCC
- Biopieces - essential configuration and getting started information
- BLAST with Multiple Processors - how to use multiple processors with regular (non-mpi) BLAST
- BLAST Data Preparation - how to prepare an NCBI data set for BLAST-ing
- Bowtie and IGV - A mapping and visualization workflow
- CIRCOS - setting up configuration parameters for CIRCOS viewer
- Converting BED to BAM format - create a sorted, indexed BAM file from a BED file
- Cortex Assemblers - information on cortex_con and cortex_var
- Converting BED to bigBED Format - useful for very large BED files to make the easier to load and browse
- Denoising 454 Data with Mothur - denoising pyrosequencing results
- Distruct - A program for the graphical display of population structure
- Extracting Sequences - grab sequences from a BED format file of experimental data using the UCSC Genome Browser.
- FastQC - practical example of using FastQC to check quality results of fastq sequences.
- GATK - run and version notes
- HOMER Howto - finding enriched motifs.
- InterProScan 5 - tutorial on using IPRSCAN 5
- InterProScan 4 - comparing user sequences to protein signature data sets.
- jModelTest2 - how to use jModelTest2 on the HPCC (important guidelines)
- MAKER Tutorial - basic example on how to run mpi-capable MAKER
- miRDeep2 Tutorial - identifies novel and known microRNA genes
- Mothur - tips for effective multiprocessing
- Mothur for Ilumina on the HPCC - from the Mothur Workshop
- Mothur - MPI versus Non-MPI Versions - explains MPI and non-MPI builds on the HPCC
- mpiBLAST Tutorial - example of how to configure and run mpiBLAST for an nt BLAST query
- OrthoMCL Tutorial - how to obtain access, basics of proper use
- PASA - Gene structure and annotation analysis tool
- Perl - verifying installed modules
- Perl Modules - how to install local perl modules using "cpan"
- Pipeline for Illumina Data - utilizing Bowtie, TopHat, HTSeq, and LOX
- Prokka - information for running Prokka on the HPCC
- QIIME - important configuration information and performance considerations
- RealPhy - Basics of using RealPhy on the HPCC
- SnpEff - the basics: configuration, databases, running
- SPAdes - basics of loading and using on the HPCC
- SRA File Manipulation - using the SRAToolkit
- Swapping Columns - a simple one line script that will help you swap two columns in a BED file (for example)
- Tablet Assembly Viewer - information on running and configuring for best performance
- Trimmomatic - information on running Trimmomatic on the HPCC, with examples
- Using Velvet and Oases - information for using effectively on the HPCC
- Using Consed - using the Consed, phrap, and phred sequencing tools.
- Using mpiBLAST - basics of configuring and using mpiBLAST
- Using Blast2GO - running the pipeline command line tool, and pre-processing input files
- Working with SFF Files - extracting featured output from 454 data using MATLAB
Useful tutorials not developed by iCER staff, but good resources nonetheless.