The High Performance Computing Center (HPCC) @ MSU manages shared computing resources consisting of clusters and development nodes. Necessarily, there is a queuing system in place to ensure fair access to resources. For an explanation of our queuing policies, please see our policy section. HPCC features one queue system where users specify resources that are needed. Torque 5.0.1 is used to manage resources. Priority access to the cluster can be purchased.
At HPCC, jobs run in a single queue based on their priority settings, as long as the resources are available to run the job. If the resources are not available, the next job in the queue that can be run and not negatively impact the expected run time of the jobs before them will attempt to run (i.e., backfill mode enabled). We encourage users to properly estimate the amount of resources required (specifically walltime and memory) to optimize the start time of their jobs.
To submit a job to the cluster, we suggest that users create a job script which contains commands to be run. A sample job script name myjob.qsub might include the following commands:
A complete list of qsub options are listed in the next section.
Once the job script has been created, the job can be submitted using the command
Options to qsub that affect the properties of your job:
This option tells TORQUE to use the Account (not Username) credential specified. Unless you are an authorized user of the buyin account, your job will be deleted. Buyin account users must use this option to reserve nodes on their buyin machines.
#PBS -A mybuyin
This option tells PBS to run a job at the given time; the time format is in the standard unix format of HHMM (24 hour time, minutes). Day, month, and year can be specified in the standard Unix format. (see the PBSPro User Manual for more information)
#PBS -a 0615
-e / -o
The location for Output_Path and Error_Path attributes as demonstrated below. Note that you only need to use one if you use the -j option.
#PBS -e /home/keenandr/myerrorfile
Declares that the job is to be run "interactively". The job will be queued and scheduled as any PBS batch job, but when executed, the standard input, output, and error streams of the job are connected through qsub to the terminal session in which qsub is running.
Using an eo option will combine STDOUT and STDERR in the file specified in Error_Path; oe will combine them in Output_Path.
#PBS -j oe
Resources, separate with comma
#PBS -l nodes=4:ppn=1,walltime=01:00:00,mem=2gb
Emails account(s) to notify once a job changes states as specified by -m
#PBS -M email@example.com
#PBS -m abe
Names the job
#PBS -N MySuperComputing
Allows the user to set priority relative to the the jobs of a user; does not affect priority relative to other users.
#PBS -p 200
Tells the system which queue to use. The HPCC only has only one queue, "main" Where the job runs on the main cluster is determined by the resources requested.
#PBS -q main
Specifies whether a job is rerunnable (y/n) if it is interrupted by a system crash or other failure; by default, PBS will automaticially rerun the job script from the beginning if the job is queued.
#PBS -r n
Submits a Array Job with n identical tasks. Each task has the same $PBS_JOBID but different $PBS_ARRAYID variables.
#PBS -t 5
Passes all current environment variables to the job.
Defines additional environment variables for the job.
#PBS -v arg1=phase3,arg2=coalesceData,numpts=50
Special Generic Resources such as software licenses can be requested using the -W option. This is most commonly used with matlab (see Matlab Licenses for more information.)
#PBS -W gres:MATLAB
This is a subset of options; consult the PBS manual for more information. You can also type "man qsub" or "man pbs_resources" on the command line.
- HPCC maintains buy-in nodes, which are able to run non-priority (non buy-in) jobs that are less than four hours in walltime. If you submit a job which requests less than four hours of walltime, it is likely that the short job will start fairly quickly.
- HPCC utilizes "backfill" to maximize the utilization of our cluster. For example, if the highest priority job requests 100 cpus but only 40 are available, and the other 60 will become available over the next 37 hours, then the 40 cpus are used to run the next highest priority job that requires ppn < 40 and walltime < 37:00:00.
- Correctly estimating the amount of memory and walltime that is needed to run your code will enable your job to be scheduled as soon as possible.
- There is some overhead for system processes. One should request slightly less memory than system capacity to ensure that a job is not cancelled.
When will my job start?
The length of time a job sits in a queue before running (also known as queue time) varies depending on the cluster load, and type of job. Such is the nature of shared computing resources. The start time of the job can be estimated by typing
if your job has been in the queue for a while, and you would like an analysis of why the job has not started, you can use the command
In this example, none of the available processors satisfy the requirement of the jobs.
Why was my job killed?
If your job over-runs the walltime requested, or over-uses the memory requested, the scheduler will terminate your job and send you an email. Please review the information from the email, or the error file in your working directory for more information on how to fix these issues. If the error messages indicate a hardware error, please contact the staff by filling out this request form.
- Users are allowed to run jobs for up to a week (168 hours) in walltime.
- Users can utilize up to a total of 520 cores at any time.
- Users can request up to 6TB of memory per job. (note that such a large job might take a while to get scheduled)
- It is against our fair use policy to artificially increase your priority in the queue. (e.g. by requesting resources that will not be used). Such accounts will be suspended.
- Jobs that run under 4 hours are able to run on the largest set of nodes.
- Jobs that request more processors or RAM have priority over smaller jobs.
- Jobs that are queued accrue priority based on how long they have been queued and how much wall-time they request.
- The scheduler will attempt to balance usage among users.
Shorter jobs can run on more nodes
Jobs that request a total running (wall-clock) time of four hours or less can run on idle buy-in and specialized nodes. Because they can access the largest set of potential nodes, they are likely to run more quickly than jobs that have to wait for the fewer general-purpose community nodes.
MSU's buy-in model guarantees that buy-in users will have access to their nodes within four hours. Specialized hardware (large memory nodes, NVIDIA, Phi accelerators) and the nodes dedicated to larger jobs have similar four-hour windows where jobs that do not meet the normal requirements of those nodes can use them. In total, about two thirds of the Intel10, Intel14, and Intel14-XL clusters are able to run four hour jobs of any size when idle.
iCER staff can assist you in restructuring your application to use these windows using system-level checkpointing tools like BLCR or application checkpointing.
Bigger jobs are prioritized
The scheduler attempts to gather resources for large jobs and then fill in smaller jobs around them. The size of the job is determined by the number of CPUs and amount of memory requested.
The scheduler also schedules eight or more cores at a time to allow more resources to be gathered for multi-core jobs.
Resource requests are monitored. Abusive resource requests may violate MSU policy.
As jobs wait in the queue, they accrues priority, which allows them to gather resources more effectively. Each user can have up to 15 jobs gathering additional priority. This is in addition to other priority factors.
The scheduler also provides a priority boost to jobs the longer they have been in the queue relative to the wall-time requested by the job.
The scheduler will attempt to ensure that resource utilization based on past consumption by adjusting the priority of users and groups that have used more than the average consumption over the past few days.