Personal tools

4.5.1.2. Start Job

Up one level
The startJob.pl platform script
This document refers to the startJob.pl platform script not the application wrapper script of the same name.

Introduction

The startJob.pl platform script is responsible for setting up the platform dependent environment within the job workspace directory, generate a platform dependent job description files and submit that file for execution to its underlying resource manager. The resource manager will try accordingly run that job description file and invoke the application via its application wrapper.

The start job script is invoked by the Job Service middleware, usually the same system that runs GRIA Services.

Job Life Cycle

When a job is submitted via the startJob script, it will follow a specific life cycle.

The job life-cycle

Figure 2 - The job life-cycle.

Figure 2 shows the life cycle of a job within GRIA (with time increasing along the x-axis). The numbered labels refer to timestamps that are required by the Services. The table below contains details of the labels with their meanings and the name of a file (to be stored in the job's session directory) that will be touched at the appropriate time in order to record the date and time that the event occurred.

Label in Figure 2 Meaning File used to store timestamp
1 Job submission time .job_submitted
2 Application wrapper start time .app_wrapper_started
3 Application start time .app_started
4 Application end time .app_ended
5 Application wrapper end time .app_wrapper_ended

The startJob script should create the file listed for label 1, while the application wrapper should create the rest.

Script API

The start job script should comply with the following command line:

startJob
  -d <absolute path to workspace directory>
  -e <full path to application wrapper script>
  [-r [job constraints]...]
  [-- [application arguments]...]
  • Flag -d specifies the full path to job workspace directory, e.g. /mnt/data/ws-123
  • Flag -e specifies the application wrapper script
  • Flag -r specifies a list of directives/constraints for the resource manager, the script should understand these directives and translate them accordingly for the job description file, the form of the constraints should be expressed in name=value pairs.
  • Flag -- specifies a list of arguments to be passed when invoking the underlying application.

Script functionality

The functionality of the start job script should include the following:

  • Parse and identify script arguments.
  • Identify the job workspace directory structure, check that specified staging directories exist as well as the specified input data.
  • Change directory to job workspace, at this point script log files can be generated and stored in job workspace directory.
  • Create .job_submitted timestamp corresponding to (2) file in Figure 2, and store in it the startJob PID. This file can be used as a lock file to indicate the status of the job is in SUBMIT state. You need to remove the lock, i.e. empty the contents of this file on exit.
  • Analyse and compose the argument string for invoking the application wrapper, e.g. make sure that application wrapper, property files exist, etc.
  • Generate RM job description file e.g. job-$$.pbs for PBS. This is a resource manager dependent file that stores resource manager directives, instructs execution nodes to run the job, etc. Usually this file should include the following:
    • Create a resource management directives section, e.g. parse -r arguments, etc.
    • Change directory to working directory.
    • Touch the file .app_wrapper_started in job workspace which corresponds to point 2 in Figure 2.
    • Run the application wrapper using the composed argument string.
    • Store application wrapper exit code in file .app_wrapper_exit_code in job workspace, i.e. point 5 in Figure 2.
    • Store application wrapper exit code in .app_wrapper_ended in the job workspace.
  • Submit job description file to RM and store the job ID number into .jobPID file in the job workspace directory. Job status scripts will read this file to find out the status of the job with that ID.
  • Remove submit job lock file, e.g. .job_submitted.

Return values

Return values are passed back to the Job Service middleware. A return value of 0 indicates that the script has successfully submitted the job, in any other condition the script should return a non-zero value.

Job Constraints

Job constraints are passed to platform scripts either from the Job Service using the -r argument or directly by the client user via a constraints XML file which the Job Service will store in the job session directory in a file called resources.xml. The startJob platform script should parse these constraints and translate them to resource manager directives. Typical resource constraints are expected to describe constraints about WallClockTime, CPUSpeed, PhysicalMemory, DiskSpace, etc.

See the job constraints page for further information.