Personal tools

3. The Data Service

Up one level
Overview and Configuration of the Data Service

Overview

The GRIA Data Service is used to manage "data stagers". A data stager is a container for a single file (or zip file). It has a unique identifier and an access control system for determining who can read and write the data. Clients can use the service to create new stagers, upload and download data, transfer data between stagers, and control others' access to the data.

Configuration

Two items of configuration must be given before the Data Service can be used:

The location of the root data directory
The service stores any uploaded data inside this directory. If the Data Service is going to be used with a Job Service and jobs are going to execute on a cluster then the cluster's machines need to be able to read and write to this directory.
A list of trusted management services
Normally, you can just click Add to accept the default management service. This is the SLA management service from the GRIA Service Provider Management package. Note that if the GRIA Basic Application Services package is deployed on a different machine to the GRIA Service Provider Management package, some additional access control setup is required. This is described in Links with Other Services section of the Service Provider Management user guide. As an alternative to configuring the service to be managed, you can make it unmanaged (or "free"), by clicking the Make service free button.

Enabling REST data transfer (optional)

Data is normally transferred using the SOAP-with-attachments protocol. However, this has a couple of limitations:

  • Many programs do not support this protocol.
  • SOAP requires the signature to be sent before the data. However, calculating the signature requires processing all the data first. On a fast network, this can roughly double the transfer time.

To solve these problems, the GRIA data service can be configured to allow downloading, uploading and deletion using the standard HTTP methods GET, PUT and DELETE.

In this case, the access control decision is made using the HTTPS transport-layer security credentials rather than the SOAP message-layer (WS-Security) credentials. Therefore, your server must be configured to request client authentication at the transport layer.

Also, GRIA services check roots of trust (i.e. trusted certificate authorities) on a per-rule basis, not by having a static set of trusted CAs. Therefore, you should disable certificate trust validation in your container.


Apache

The easiest way to configure this is to front your GRIA services with the Apache web-server. Use this option to request client certificates but leave trust validation to GRIA:
SSLVerifyClient optional_no_ca

N.B. by enabling this option, you need to comment out the trusted certificate authority option, i.e. SSLCertificateChainFile.

Then, go to your service administration page and click on Endpoints configuration. Change the port from the Tomcat port (usually 8443) to the Apache port (usually 443).


Testing


You should now be able to upload and download data using any HTTPS-capable application, such as "curl".

The URL to use for transfers is of the form <webapp>/data-stager/<stager-ID>. For example, to upload file-to-upload to data stager ff808181-152215bf-0115-221982b5-0002:
curl --cert me.pem -T file-to-upload \
https://example.com/gria-basic-app-services/data-stager/ff808181-152215bf-0115-221982b5-0002

To get a pem file with your private key and certificate from a PKCS#12 file (e.g. one exported from your keystore using KeyToolGUI):

openssl pkcs12 -in me.p12 -out me.pem -clcerts

See this FAQ for more detail. The URL can also be found in the metadata section of the data stager's EPR, using the getRestURL method.