There are a variety of Moab commands to monitor and control jobs. Some handy commands are listed below. The -v option may be used on many commands to get verbose output. Using -v -v provides even more verbose output.
Please see man pages for more information.
|checkjob <jobID>||provide status report for specified job|
|mjobctl||controls various aspects of jobs and can display diagnostic info about each job|
|canceljob <jobID>||cancel specified job. Deprecated, use mjobctl -c instead|
|mshow||display diagnostic messages about the system and queues|
|showstart||shows estimated start time of idle jobs at the time the command is run|
|showres||displays state of all reservations in place within
Moab, on a reservation-by-reservation basis.
|showq||displays information about active, eligible, blocked,
and/or recently completed jobs.
|showbf||shows current resource availability|
|shownodes||gives listing of 16core, 24core and 64GB nodes available|
Active jobs are those that are running or starting and are consuming resources.
Eligible jobs are queued and eligible to be scheduled. Their state is Idle.
Blocked jobs are currently ineligible to be run or queued.
Jobs are often blocked by the Moab scheduler for a variety of reasons. The user can view the reasons for the blocked jobs by issuing the following command:
% checkjob <job-id>
To estimate when your jobs will start to run, use the showstart command with the job id. For example,
[someuser@login1 ~]$ showstart 1585836 job 1585836 requires 32 procs for 3:00:00:00 Estimated Rsv based start in 2:05:53 on Wed May 4 18:03:42 Estimated Rsv based completion in 3:02:05:53 on Sat May 7 18:03:42
Using the -v -v option on the checkjob command provides verbose output that will show why a job isn't running.
To get information about your job after it runs you may use the showhist.moab.pl script with the job id. For example,
[icarpent@login2 ~/output]$ /nopt/moab/tools/moab/showhist.moab.pl 180500
Job Id : 180500
Executable : /nopt/moab/spool/moab.job.iw15dV
User Name : icarpent
Group Name : icarpent-upg
Account Name : CSC000
Queue Name : batch
Node Count : 2
Processor Count : 48
Wallclock Duration: 00:01:11
Submit Time : Tue Oct 15 12:01:36 2013
Start Time : Tue Oct 15 12:02:08 2013
End Time : Tue Oct 15 12:03:19 2013
Exit Code : 0
Allocated Nodelist: n0893:n0894
This shows that job number 108500 ran on 2 nodes (n0893 and n0894) and took 1 minute 11 seconds.