4.4.2.
startJob Wrapper Script
Up one level
Language
Like all other application wrapper scripts, startJob can be written in any scripting language supported by the host OS.
- For Linux, the first line of the script (eg.
#!/usr/bin/python) is used to determine which interpreter to use. The filename extension can be anything (eg. startJob.py, startJob.sh). - On Windows, the filename extension is used to determine which interpreter to use. Currently only Python (.py) and Perl (.pl) are recognised.
Application Wrapper Functionality
The startJob application wrapper script is a mandatory script that deals with any application specificity, allowing the Job Service to treat all applications in the same way, and so decoupling the Job Service from the details of the application.
The main functions of the application wrapper script are:
- handling input and output data files;
- setting up an environment (i.e. environment variables) that is suitable to run the application;
- enforcing any security precautions to protect against loop-holes in the application;
- running the application itself.
The application wrapper is designed to run on the execution platform, having been submitted by the RM connector scripts for starting a job. Prior to submitting the wrapper script on the execution platform, the Job Service will have set up a workspace (directory) for the job, copied input data into it e.g. work/inputs, and created a working sub-directory for the job to run in, e.g. work. The following listing shows a workspace directory structure with two input files and an empty outputs directory.
ff808081-1017450e-0110-174532dd-0001-1 `--work |-- inputs | |-- namedinput | |-- arrayinput-0 | `-- arrayinput-1 `-- outputs
After changing to the workspace directory (not the working sub-directory), the wrapper will be submitted using the following command line:
app-wrapper <application arguments>
The functionality of the wrapper script should include the following:
- Parse wrapper arguments, including security checks for illegal input designed to inject malicious commands into the command-line used to launch the application.
- Move input data files into the working sub-directory, including unpacking any that are compressed archives containing multiple inputs.
- Create a consistent environment in the working directory, by setting up environment variables and rewriting input data to match the local environment where necessary.
- Build the command line and run the underlying application.
- Copy output files from the working directory into the output directory, including packing multiple outputs into compressed archive files where necessary.
- Exit by returning the exit code of the application.
For simple applications, security can be maintained by checking input parameters during step 1. and if necessary data files during step 3. If the application is too complicated for this to be reliable, it may also be necessary to set up a sandboxed working environment and run the code inside it during step 4.
Some of these steps are considered in more detail below.
Input and Output Data Handling
Note: As of version 5.2 of the GRIA Job Service, applications can specify names for their inputs in the application metadata file. As a backwards compatibility measure, if the application metadata file still uses the old GRIA 5.1 format, inputs will be named numerically in the order they appear in the metadata (eg. input-0, input-1, etc).
When unpacking input data, the application wrapper should attend to the following:
- Create any substructure needed in the job's working sub-directory of the workspace.
- Copy or unzip input files from the inputs sub-directory into the job's working space.
- Check that all input needed to run the job is present.
The Job Service knows in advance which output files must be returned to the outputs directory. The application wrapper must create these files by:
- Copying or zipping data to create the required output files in the outputs sub-directory
- Renaming these files to the names specified in the metadata (or the output-x naming scheme for legacy applications)
The Job Service will detect that the application wrapper script has finished and handle the transfer of output files accordingly.
Consistent Context Reconstruction
Why do we need context reconstruction? The input data for our application has been created on another system with a different directory structure, environment and possibly even operating system. We have to set up an equivalent (not necessarily identical) environment on our execution platform, and make sure any input data references to the remote user's environment are mapped onto the one we have created, or they will be invalid when the application is started.
When and where should context reconstruction be performed? One should handle it as close as possible to the running application—certainly on the execution platform where the job will actually be run—as this is where the environment is needed. This is why the Job Service doesn't attempt to create the context itself - there is no point doing it at the service host if the job will be executed on a compute node in a Condor cluster. Instead, we leave it to the application wrapper to handle everything in an application specific way on the execution platform itself.
A typical approach to context reconstruction might involve passing an array of named parameters to the Job Service, including environment settings as well as application flags. These will be passed to the wrapper through its argument list. In addition, one can provide settings in an extra input file, intended for the wrapper rather than the job itself, and used to set up the environment prior to running the application code.
The hardest job for the wrapper is to parse and rewrite application input data where necessary to ensure it is consistent with the environment established on the execution platform. If this is not needed, it is usually quite easy to 'wrap' an application to run inside the Job Service. Where it is necessary, the wrapper may become a significant body of code in its own right.
For example, consider the following line of input intended for the rendering application AIR, used with the Job Service to provide a grid-enabled video rendering service:
Option 'searchpath' 'shader' ['&:e:\AnimalLogic\MaxMan\shaders:C:\Sample\shaders']
The problem here is that the application uses plug-ins to perform part of the rendering calculation, and the search path for these can be specified in the user input. This particular input file has been generated automatically using a graphical environment for video post-production, which has filled in the relevant path based on where the shader libraries were installed on the user's local machine.
The wrapper has to identify which shaders are needed, and substitute the path to them on the local system:
Option 'searchpath' 'shader' ['&:/export/apps/AnimalLogic/MaxMan/shaders:/export/apps/air/Sample/shaders']
In some cases, it may be possible to infer the meaning of client-side environment references by pattern matching against a list of meaningful terms used by the application. In others (probably in this case), it is necessary for the user to send the install path quoted for specific groups of plug-ins as service arguments or environment settings, so the wrapper can find them and map them onto the equivalent installed groups of components on the execution platform.
In extreme cases, it may be necessary to establish multiple services to run the same application in different ways, allowing a different, specific environment to be set up for each. For example, it probably wouldn't make sense to have a single service to run a computational fluid dynamics (CFD) code capable of simulating coolant flows through automotive engines AND the propagation of drugs in aerosol suspension in human lungs. It would be asking too much of a wrapper developer to differentiate and correctly handle such extreme cases, and instead one should set up two services each with its own wrapper specialised to one of these scenarios.
Security Containment
Why Wrappers have to Bother with Security
The Job Service regards application wrapper scripts as trustworthy, because the service operator can inspect them and make sure they don't do anything strange or foolhardy. However, the applications may be third party, closed source executables that cannot be inspected, and were not designed as network-accessible services in the first place.
Wrapper scripts can protect the service from malicious users in three ways:
- checking any user input used to create the command line for running the application, to exclude command injection attacks using parameters like 'method=gauss; cd /; rm */*';
- checking input data known to be used in an unsafe way by the application, e.g. to construct system calls for executing plug-ins or moving files around;
- confining the application to a sandbox, by first preparing the sandbox and then launching the application in it.
If the application is very simple, or designed to withstand malicious users, or if you have only a small number of users you know well (and trust not to mislay their credentials) then it may be OK to include only the first of these measures.
Legacy applications are quite likely to do things in unsafe ways. Renaming files or testing if they exist are sometimes done via system calls. This can be a potential security hole if the application developer wasn't expecting filenames to be sent by a remote user who may have malicious intent. If the application isn't too complex, or if you can check with the developer on what might happen, then it should be OK if you also check the user-supplied input, filenames and other data that may be sent to unsafe system calls.
In the worst case, one has to assume the application will be unsafe, and attempt to contain any damage caused by malicious (or possibly careless) input by restricting what the application can do and where it can do it. There are several possible ways to achieve such restrictions.
Chroot
On Linux systems, chroot can be used to restrict a sub-process to an arbitrary sub-directory, e.g. a job's working directory. The chroot mechanism was designed for use by operating system developers to allow them to create a pseudo-root within which to test their code. While the chroot container doesn't prevent access to low-level devices, it will prevent most legacy applications accessing files outside the specified sub-directory. Chroot is widely used to contain web servers and other network applications to minimise the scope for damage if they are compromised.
To use chroot, it is necessary to create a complete operating system environment inside the job's working directory (which it will see as '/'). One has to copy application binaries, resolve any references to system/application libraries, create devices such as /dev/null, etc. To create a self-sufficient chroot 'jail' environment sufficient to run the application may not be easy, and of course, it would need to be repeated for each individual job. However, it can provide a good safety level as its 'jail' environment is enforced by the operating system itself.
Restricted Shells
Many shells, including bash, provide a restriction mechanism usually invoked by running the shell with the -r switch. Some common features of restricted shells are the ability to prevent a program from changing directories, to only allow the execution of commands using absolute pathnames, and to prohibit executing commands in other subdirectories, using command-line redirection operations, or changing the search path.
Minimal privilege accounts
Another approach is to create a low-privilege account for each job. The wrapper script would then have to assign such an account, change the working directory so it is owned by this account, and run the application in that working directory under the same account. Provided the same account is not used for anything else (including running other jobs), the application can be prevented from accessing anything outside the working directory, even if it can be induced to run some unforeseen system call by sending some malicious input.
The two drawbacks with this approach are:
- ideally one should create a pool of accounts and provide a way for the wrapper to assign them to jobs rather than creating new accounts, but this isn't supported at present;
- the wrapper would need sufficient privilege to set the account under which a sub-process is run, which may make the wrapper more dangerous if it can be compromised.
The second drawback may not be too bad, given that the wrapper at least can be designed to check all inputs and avoid doing anything unpleasant. At present, the Job Service runs with a normal unprivileged user identity, so it may be better to use other methods to contain individual jobs.
Other Methods
The above list is by no means exhaustive. For example, if the chroot 'jail' is not sufficient, one can create an entire virtual machine on which to run a potentially unsafe application. Software such as VMWare can be used to implement this approach, but users who want to go to these lengths are on their own, at least in this version of the software.
Error Handling
If an error is encountered in the application, the wrapper must report the fact. If this is not done, the Job Service will assume everything is OK, and the users client application will probably attempt to continue, which may not be appropriate if some output from the job is missing, etc.
The application startJob wrapper should exit with an exit status of zero if the job has completed successfully, or with a non-zero status if the job has failed. This value will be stored in .exit_code by the RM Connector script. Generic clients may stop executing a workflow, for example, if this result is not zero.
An Example Wrapper Script
This example is based on the ImageMagick applications which were installed as part of the GRIA installation. To get started, we will create a simple wrapper that runs this application.
- Create a startJob wrapper script:
#!/usr/bin/env python print("Swirl wrapper started") print("Copying input to work directory...") shutil.copyfile("inputs/sourceImage", "image.jpg") print("Transforming image...") p = subprocess.Popen(["mogrify", "-swirl", "60", "image.jpg"]) ret = p.wait() if ret != 0: print("Failed to transform image, error=%s" % ret) sys.exit(ret) print("Copying result to output stager...") shutil.copyfile("image.jpg", "outputs/outputImage") print("Swirl job completed successfully")
This will perform the following steps:
- Copy the input image into the work directory.
- Run the mogrify command to transform the image.
- Copy the result to the output stager.
- Edit the startJob.py script to run your command instead of mogrify (the command you tested above).
- Make the script executable:
$ chmod a+x startJob.py