Page tree
Skip to end of metadata
Go to start of metadata

scontrol command

      Besides the brief listing of every jobs by squeue command, user can also see the detailed information of each job. Run SLURM command scontrol show with a job ID:

$ scontrol show job 8929
JobId=8929 JobName=test
   UserId=nobody(804293) GroupId=helpdesk(2103) MCS_label=N/A
   Priority=404 Nice=0 Account=classres QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A
   SubmitTime=2018-08-01T14:33:04 EligibleTime=2018-08-01T14:33:04
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2018-08-03T12:38:48
   Partition=general-short-14,general-short-16,general-short-18,general-long-14,general-long-16,general-long-18,classres-14,classres-16 AllocNode:Sid=dev-intel18:4996
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=80-80 NumCPUs=160 NumTasks=80 CPUs/Task=2 ReqB:S:C:T=0:0:*:*
   TRES=cpu=40,mem=80G,node=40,gres/gpu=40
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=2 MinMemoryNode=2G MinTmpDiskNode=0
   Features=intel14 DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/mnt/home/changc81/GetExample/helloMPI/test
   WorkDir=/mnt/home/changc81/GetExample/helloMPI
   Comment=stdout=/mnt/home/changc81/GetExample/helloMPI/slurm-8929.out
   StdErr=/mnt/home/changc81/GetExample/helloMPI/slurm-8929.out
   StdIn=/dev/null
   StdOut=/mnt/home/changc81/GetExample/helloMPI/slurm-8929.out
   Power=

You can check if the information is right for the job. If the job has not started to run and you would like change any specification, you can hold the job first by scontrol hold command:

$ scontrol hold 8929
$ squeue -l -u $USER
Fri Aug  3 12:26:57 2018
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
              8929 general-s     test   nobody  PENDING       0:00      1:00     80 (JobHeldUser)

where you can see from the results of the squeue command, the job is pending due to the user's hold. Now, you can choose the parts of information you want to change in scontrol show results. Put them in scontrol update command and modify the information after symbol "=" . For example, the command line

$ scontrol update job 8929  NumNodes=2-2 NumTasks=2 Features=intel16

will change the resource request of the job 8929 from 80 nodes and 80 tasks with intel14 nodes to 2 nodes and 2 tasks with intel16 nodes. After the update, you can use scontrol show command again to verify the job setting. Once you are done with the update work, you can release the job hold by command scontrol release:

$ scontrol release 8929
$ squeue -l -u $USER
Fri Aug  3 13:18:10 2018
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
              8929 general-s     test   nobody  RUNNING       0:07      1:00      2 lac-[386-387]

The job is now running due to the change of the resource request by command scontrol update. Again, we can check the running job by the command:

$ scontrol show job 8929
JobId=8929 JobName=test
   UserId=changc81(804793) GroupId=helpdesk(2103) MCS_label=N/A
   Priority=379 Nice=0 Account=classres QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:08 TimeLimit=00:01:00 TimeMin=N/A
   SubmitTime=2018-08-01T14:33:04 EligibleTime=2018-08-01T14:33:04
   StartTime=2018-08-03T13:18:03 EndTime=2018-08-03T13:18:11 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2018-08-03T13:18:03
   Partition=general-long-16 AllocNode:Sid=dev-intel18:4996
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=lac-[386-387]
   BatchHost=lac-386
   NumNodes=2 NumCPUs=4 NumTasks=2 CPUs/Task=2 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=4G,node=2,billing=4
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=2 MinMemoryNode=2G MinTmpDiskNode=0
   Features=intel16 DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/mnt/home/changc81/GetExample/helloMPI/test
   WorkDir=/mnt/home/changc81/GetExample/helloMPI
   Comment=stdout=/mnt/home/changc81/GetExample/helloMPI/slurm-8929.out
   StdErr=/mnt/home/changc81/GetExample/helloMPI/slurm-8929.out
   StdIn=/dev/null
   StdOut=/mnt/home/changc81/GetExample/helloMPI/slurm-8929.out
   Power=

For a complete usage of scontrol command, please refer to the SLURM web site.

scancel command

      If at any moment before the job complete, you would like to remove the job, you can use scancel command to cancel a job. For example, the command line

$ scancel 8929

cancel the running of the job 8929. For a complete usage of scancel command, please refer to the SLURM web site.





  • No labels