4.5.2.3.
Condor
Up one level
GRIA pre-supplied platform scripts for Condor systems provide identical functionality with the PBS platform scripts. These scripts are working on a very basic Condor configuration. As a basic condor testbed platform we used:
- All condor and GRIA services run on the same system
- Condor default values used
- System users, i.e. GRIA user (tomcat) can submit and run simple condor jobs
Condor platform scripts can be easily customised in the following sections:
Submit Job: startJob.pl
This is a perl script to submit GRIA jobs in a Condor pool.
- SECTION A: Initialise Resource Manager global vars, such as path for Condor
binaries, master server name, etc. In particular make sure that the following
variables are set up correctly:
- RM_PATH=<Condor binary path>
- RM_SERVER=<Condor server>
- SECTION B: Turn verbose debug flags on/off. This step is optional.
- SECTION C: Generate a job description file (JDF), this is the file submitted to
Condor to run the job. The condor JDF file includes all the required condor
directives to run the job and the job itself is described as
frame.
Resource manager directives passed as command line arguments should be
processed in SECTION E and append at the end of JDF. The default
condor directives section in this script includes:
universe = vanilla executable = frame arguments = $aRG shell = /bin/bash error = $JOB_ERR log = job.log output = $JOB_OUT should_transfer_files = IF_NEEDED when_to_transfer_output = ON_EXIT queue $raString # see SECTION E
You should edit this section if your condor configuration requires different directives.
The frame1 is a simple shell script that condor has to run for every GRIA submitted job. The functionality of the frame script is to change the working directory, invoke the application wrapper and generate the time-stamp files before and after the execution of the application wrapper, e.g.#!/bin/bash cd $SESSION_DIR/$WORK_DIR touch ../$APP_WRAPPER_STARTED_TS ${EXE_WRAPPER} $aRG echo \$? > ../.app_wrapper_exit_code touch ../$APP_WRAPPER_ENDED_TSIn most cases you should not need to change the frame code. - SECTION D: This section contains the condor submit command:
# compose the submit argument my $command_line="$RM_SUBMIT $JDF"; # execute condor submit, store job ID my $sub = 0xffff & system "$command_line > $JOB_PID";
The expected return of the condor submit command usually is similar to:Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 25.
You should only change this part of the code if you use a customised condor submission command. - SECTION E: This subroutine should parse command line arguments for the RM. For
the condor system it should return a text string with valid condor directives
that will be attached in the JDF file, e.g. ${raString}, e.g.
Requirements = Arch =="INTEL" && OpSys == "Linux" && Memory > 20 Rank = (Memory > 32)*((Memory * 100) + (IsDedicated * 10000) + Mips)
Check Job: getJobStatus
This is a perl script that reports the status of a condor job, customisation of the code should take place in the following sections:
- SECTION A: Initialise Resource Manager global vars, such as path for Condor
binaries, master server name, etc. In particular make sure that the following
variables are set up correctly:
- RM_PATH=<Condor binary path>
- RM_SERVER=<Condor server>
- SECTION B: Turn verbose debug flags on/off. This step is optional.
- SECTION C: This section reads the condor_q command output which typically should
be similar to:
-- Submitter: siegerrebe.it-innovation.soton.ac.uk : <xxx.xxx.xxx.xxx:42239> : siegerrebe.it-innovation.soton.ac.uk ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 58.0 tomcat 7/11 14:46 0+00:00:00 R 0 0.0 frame -i ../staged
In this example the job status is reported on the 9th field:my $qString = `${RM_QUEUE} $PID | grep $PID`; my @words = "ewords('\s+', 0, $qString); my $jStatus = $words[6];You should only change this part of the code if you intend to use a customised format of the condor_q command.
Kill Job: killJob
This is a perl script that terminates condor jobs, the following parts of code may need editing:
- SECTION A: Initialise Resource Manager global vars, such as path for Condor
binaries, master server name, etc. In particular make sure that the following
variables are set up correctly:
- RM_PATH=<Condor binary path>
- RM_SERVER=<Condor server>
- SECTION B: Turn verbose debug flags on/off. This step is optional.
- SECTION C: This section reads the condor_q in order to
figure out the state of the condor job. The command output typically, should
be similar to:
-- Submitter: siegerrebe.it-innovation.soton.ac.uk : <xxx.xxx.xxx.xxx:42239> : siegerrebe.it-innovation.soton.ac.uk ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 58.0 tomcat 7/11 14:46 0+00:00:00 R 0 0.0 frame -i ../staged
In this example the job status is reported on the 6th fieldmy $qString = `${RM_QUEUE} $PID | grep $PID`; my @words = "ewords('\s+', 0, $qString); my $jStatus = $words[6];You should only change this part of the code if you intend to use a customised format of the condor_q command.
1 Submitting a simple shell script to a resource manager instead of the real application itself can sometimes cause problems e.g. advanced configurations running an application in parallel mode, etc. It is advisable in such cases to try and move the necessary functionality either to the application wrapper or up to the resource manager section, e.g. prologue and epilogue parts in PBS, etc.
