GRIA can use any Resource Management (RM) system to run jobs, via its platform script API. GRIA is using local execution as the default RM. Integrating other RM with GRIA is not a difficult task; this tutorial describes how to integrate RM systems such as PBS/Torque with GRIA.
How to develop GRIA platform scripts for PBS/Torque resource managers.
How to use and configure PBS/Torque platform scripts with GRIA
The example scripts supplied with this tutorial work for a very basic PBS/Torque configuration. However, they
can be very easily modified to support customised PBS/Torque configurations too. The basic
PBS/Torque testbed platform we used to develop and test these scripts had the following
configuration:
- All PBS/Torque and GRIA services run on the same machine, i.e. pbs_server, pbs_sched,
pbs_mom;
- Use the default PBS/Torque queue;
- System users e.g. the GRIA user (tomcat) can submit and run simple jobs.
This example startJob.pl script
can submit GRIA jobs to PBS/Torque RM systems. Customisation of this
script will require modifications to the following:
- SECTION A: Initialise Resource Manager global variables, such as the path for PBS
binaries, the PBS server name, etc. In particular make sure that the following
variables are set up correctly:
- RM_PATH=<PBS binary path>
- RM_SERVER=<PBS server name>
- SECTION B: Turn verbose debug flags on/off. This step is optional.
- SECTION C: This section generates a job description file (JDF), which is the
actual file submitted to PBS to run the job.
You should edit
this part of code if you want to change any of the default PBS
directives or change the way jobs are submitted.
The PBS JDF file has two main parts, the first one describes
all the PBS directives required to run the job.
This section of the code should be edited only when we have to
pass specific PBS directives than the exiting ones or to parse RM
directives passed with the -r arguments, i.e. see section E below.
The default directives used in that part of the script are:
#PBS -N J${SESSION_NAME}
#PBS -o job.out
#PBS -e job.err
#PBS -l cput=3600
${raString} # see SECTION E
...
The second part of the file describes how to invoke the
application wrapper and how to create time-stamp files,
etc. This part of the code should cover a wider range of PBS
configurations.
- SECTION D: This section contains the actual PBS submit
command. According to your PBS system configuration you may have
to edit it only for customised PBS configurations that use
multiple queues, PBS servers, etc.
# compose submit command to the default queue
my $command_line="$RM_SUBMIT -q \@$RM_SERVER $JDF";
# execute the submit command and store submission job ID
my $sub = 0xffff & system "$command_line > $JOB_PID";
- SECTION E: This subroutine should process any job constraints passed as
command line arguments for the RM into PBS directives.
The subroutine should return a text string with valid PBS directives that will be
attached in the JDF file PBS directives section,
e.g. ${raString}.
The current implementation of this subroutine returns an empty
string. However, if you intend to pass RM directives dynamically
using the -r command line arguments you should parse them
in this subroutine and return them as a PBS directive string, e.g.
...
#PBS cput=2300
#PBS -l 2
...
Check Job: checkJob.pl
This is a perl script that checks and reports to GRIA users the status of
a job. For most PBS configurations the default rm_local/checkJob.pl script
can be used without editing.
Kill Job: killJob
This is an example killJob.pl script for terminating PBS jobs, the following parts
of the code need editing:
- SECTION A: Initialise Resource Manager global vars, such as path for PBS
binaries, PBS server name, etc. In particular make sure that the following
variables are set up correctly:
- RM_PATH=<PBS binary path>
- RM_SERVER=<PBS server>
- SECTION B: Turn verbose debug flags on/off. This step is optional.
- SECTION C: The first part of this section reads the status of the PBS job. According
to your PBS configuration you may have to edit the code that grabs the job
status, e.g. in a PBS qstat command the status of a job is always
the 6th field, etc.
my $qString = `${RM_QUEUE} | grep $concatPID`;
my @words = "ewords('\s+', 0, $qString);
my $jStatus = $words[9];
Unless the output format of qstat is different you do not need to
change this section, e.g.
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
74.siegerrebe pm tomcate 00:30 0 R dque