Personal tools

4.5.2.3. Condor

Up one level
How to use and configure the supplied Condor platform scripts

GRIA pre-supplied platform scripts for Condor systems provide identical functionality with the PBS platform scripts. These scripts are working on a very basic Condor configuration. As a basic condor testbed platform we used:

  • All condor and GRIA services run on the same system
  • Condor default values used
  • System users, i.e. GRIA user (tomcat) can submit and run simple condor jobs

Condor platform scripts can be easily customised in the following sections:

Submit Job: startJob.pl

This is a perl script to submit GRIA jobs in a Condor pool.

  • SECTION A: Initialise Resource Manager global vars, such as path for Condor binaries, master server name, etc. In particular make sure that the following variables are set up correctly:
    • RM_PATH=<Condor binary path>
    • RM_SERVER=<Condor server>
  • SECTION B: Turn verbose debug flags on/off. This step is optional.
  • SECTION C: Generate a job description file (JDF), this is the file submitted to Condor to run the job. The condor JDF file includes all the required condor directives to run the job and the job itself is described as frame. Resource manager directives passed as command line arguments should be processed in SECTION E and append at the end of JDF. The default condor directives section in this script includes:
    universe        = vanilla
    executable      = frame
    arguments       = $aRG
    shell           = /bin/bash
    error           = $JOB_ERR
    log             = job.log
    output          = $JOB_OUT
    should_transfer_files = IF_NEEDED
    when_to_transfer_output = ON_EXIT
    queue
    $raString    # see SECTION E
    You should edit this section if your condor configuration requires different directives.
    The frame1 is a simple shell script that condor has to run for every GRIA submitted job. The functionality of the frame script is to change the working directory, invoke the application wrapper and generate the time-stamp files before and after the execution of the application wrapper, e.g.
    #!/bin/bash
    cd $SESSION_DIR/$WORK_DIR
    touch ../$APP_WRAPPER_STARTED_TS
    ${EXE_WRAPPER} $aRG
    echo \$? > ../.app_wrapper_exit_code
    touch ../$APP_WRAPPER_ENDED_TS
    In most cases you should not need to change the frame code.
  • SECTION D: This section contains the condor submit command:
    # compose the submit argument 
    my $command_line="$RM_SUBMIT $JDF";
    
    # execute condor submit, store job ID
    my $sub = 0xffff & system "$command_line > $JOB_PID";
    The expected return of the condor submit command usually is similar to:
    Submitting job(s).
    Logging submit event(s).
    1 job(s) submitted to cluster 25.
    You should only change this part of the code if you use a customised condor submission command.
  • SECTION E: This subroutine should parse command line arguments for the RM. For the condor system it should return a text string with valid condor directives that will be attached in the JDF file, e.g. ${raString}, e.g.
    Requirements = Arch =="INTEL" && OpSys == "Linux" && Memory > 20
    Rank = (Memory > 32)*((Memory * 100) + (IsDedicated * 10000) + Mips)

Check Job: getJobStatus

This is a perl script that reports the status of a condor job, customisation of the code should take place in the following sections:

  • SECTION A: Initialise Resource Manager global vars, such as path for Condor binaries, master server name, etc. In particular make sure that the following variables are set up correctly:
    • RM_PATH=<Condor binary path>
    • RM_SERVER=<Condor server>
  • SECTION B: Turn verbose debug flags on/off. This step is optional.
  • SECTION C: This section reads the condor_q command output which typically should be similar to:
    -- Submitter: siegerrebe.it-innovation.soton.ac.uk : <xxx.xxx.xxx.xxx:42239> : siegerrebe.it-innovation.soton.ac.uk
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
      58.0   tomcat          7/11 14:46   0+00:00:00 R  0   0.0  frame -i ../staged
    In this example the job status is reported on the 9th field:
    my $qString = `${RM_QUEUE} $PID | grep $PID`;
    my @words = &quotewords('\s+', 0, $qString);
    my $jStatus = $words[6];
    You should only change this part of the code if you intend to use a customised format of the condor_q command.

Kill Job: killJob

This is a perl script that terminates condor jobs, the following parts of code may need editing:

  • SECTION A: Initialise Resource Manager global vars, such as path for Condor binaries, master server name, etc. In particular make sure that the following variables are set up correctly:
    • RM_PATH=<Condor binary path>
    • RM_SERVER=<Condor server>
  • SECTION B: Turn verbose debug flags on/off. This step is optional.
  • SECTION C: This section reads the condor_q in order to figure out the state of the condor job. The command output typically, should be similar to:
    -- Submitter: siegerrebe.it-innovation.soton.ac.uk : <xxx.xxx.xxx.xxx:42239> : siegerrebe.it-innovation.soton.ac.uk
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD 
      58.0   tomcat          7/11 14:46   0+00:00:00 R  0   0.0  frame -i ../staged
    In this example the job status is reported on the 6th field
    my $qString = `${RM_QUEUE} $PID | grep $PID`;
    my @words = &quotewords('\s+', 0, $qString);
    my $jStatus = $words[6];
    You should only change this part of the code if you intend to use a customised format of the condor_q command.

 

1 Submitting a simple shell script to a resource manager instead of the real application itself can sometimes cause problems e.g. advanced configurations running an application in parallel mode, etc. It is advisable in such cases to try and move the necessary functionality either to the application wrapper or up to the resource manager section, e.g. prologue and epilogue parts in PBS, etc.