Personal tools

4.1. Overview

Up one level
Overview of the GRIA Job Service

The GRIA Job Service is used to manage jobs. Clients can use the service to create new jobs, upload input data, start the job, monitor progress, and download results.

Each input and output of a job is actually a data stager managed by the local Data Service (the one in the same .war as the Job Service). Therefore, you must configure the Data Service before the Job Service can be used. Users can run jobs that take input from or send output to other Data Services by using the normal data transfer features provided by the Data Service.

Job Service Architecture

The GRIA Job Service architecture is flexible enough to use a variety of underlying computing platforms to run jobs e.g. from single computers to clusters of workstations or even supercomputers. In order to achieve this flexibility, the GRIA Job Service accesses resources indirectly via its RM connector scripts, which decouple the GRIA Job Service from resource managers and applications. The following sections of this document give an overall picture of the various components of the GRIA Job Service, and then describe some common deployment scenarios.

Components of the GRIA Job Service

Overview of the GRIA Job Service architecture

Figure 1. Overview of the GRIA Job Service architecture

The GRIA Job Service is separated into several distinct components: (colours relate to the above diagram)

  1. The Resource Manager Connectors - The GRIA Job Service can submit jobs to different resource managers such as TorquePBS and Condor, or even run them on the local machine using the LocalExecution plugin. It is able to do this thanks to the Resource Manager Connector layer - a plugin architecture written in Python that abstracts away resource manager-specific details and presents a single interface for submitting and monitoring jobs. Service providers can configure the Job Service to use the existing Resource Manager Plugins or they can write their own to interface with custom configurations.

    Versions 5.2, or newer, of the GRIA Job Service can have any number of RM Connector Plugins loaded at the same time. Selecting which one to use for an individual job is done in a series of steps, and is explained in The Selection Process chapter.

  2. The Application Wrapper Scripts - Each Application deployed on the GRIA Job Service needs to have a couple of small wrapper scripts installed alongside it. These scripts are responsible for providing the application with the correct files from the shared filesystem, and making sure the outputs from the application are written or copied back to the correct location. Optional wrapper scripts can also be written to cancel a job gracefully and report progress and usage information specific to that application (eg. frames rendered by a graphics package).

  3. The Shared Filesystem - When the user creates a job, the GRIA Job Service creates a directory for it on the shared filesystem. The administrator should ensure that this directory can be read from and written to by both the Job Service running on the server, and the application wrapper scripts running on the execution nodes. The structure of this scratch directory is as follows:

    • logsys - Log file for the system administrator. Contains information about the RM connector plugins and resource constraints.
    • loguser - Log file for the user. Contains the stdout and stderr from the job executable.
    • work/ - Working directory in which the job executes.
    • work/inputs/ - Directory containing the named inputs for the job.
    • work/outputs/ - Directory to which the application wrapper scripts should write the job's outputs.

When configuring the Job Service, the administrator must be careful to ensure that the different files and executables can be accessed by the correct components in the system.

  • The application executables should be accessible by compute nodes only, and they should be read only. Installation of the applications can be either local per compute node or over disk space shared among all nodes.
  • The application wrapper scripts should be accessible by both the compute nodes and the Job Service. Like the application executables they should be read only, and can either be installed locally or part of a shared filesystem.
  • The RM connector scripts should be accessible by the GRIA Job Service middleware.
  • The job's scratch directory should be accessible by both the compute nodes and the Job Service. The Job Service must have write permission on the entire directory, but applications themselves only need to access the work/ subfolder. The application wrapper scripts could set up a chroot jail inside this directory to ensure nothing else on the filesystem can be accessed. Note that the scratch area cannot be copied between compute nodes; instead it must be exported as a shared disk space.

Local Execution Deployment

This is a typical minimum configuration scenario: the Job Service and the job execution run locally on the same machine.

Local Execution Deployment

Figure 2. Local Execution Deployment

Figure 2 shows how GRIA can be configured to run applications locally on the server machine running Tomcat. Because of its simplicity and minimal configuration, this deployment is commonly used for demonstrations and testing. This is the default configuration for the GRIA Job Service assuming the administrator does not set up the TorquePBS or Condor plugins. Note that it is not advisable to leave the GRIA Job Service configured like this in a production environment.