Personal tools
You are here: Home GRIA Documentation Documentation 5.3 Reference Manuals Service Provider Management User Guide

Service Provider Management User Guide

Note: Return to reference manual view.

This guide describes how to use the Service Provider Management Package for SLA based service management and billing based on a simple management protocol for both GRIA and 3rd party application services.

1. Overview

An introduction to the Trade Account Service and SLA Management Service.

The service provider management package allows a service provider to support service level agreements and if necessary bill for service usage, via a simple management protocol that can be used with GRIA basic application services and with other (non-GRIA) application services, if required. It consists of:

A Trade Account Service
This supports the creation and management of trade accounts, each representing a trust relationship between the service provider and a customer (the account's budget holder), who is responsible for all service access on their account. The budget holder can then allow others to access services, and can monitor usage on the account. Services that need to bill for their usage do so by recording charges at the customer's trade account.
An SLA Management Service
This provides the service manager with the ability to define their available resources (e.g. CPUs, applications, etc), assign portions of their resources to customers by the way of service level agreements (SLAs), and bill for resource usage based on the pricing terms in these agreements.
The service provider management package

2. Installation

How to install the package.

The Service Provider Management package is provided as a zip file. Unzip the file and you will find the following items:

  • docs (folder)
  • gria-service-provider-mgt.war
  • README.html

Note that upgrade of the service provider management package from version 5.0 to 5.1 is not supported.  If you need to upgrade a 5.0 installation, first upgrade to 5.0.1 before upgrading to 5.1. Details of how to install the war file (including how to upgrade) are provided below.

Install the war file according to the Service Installation Manual. Once the initial configuration has been completed, the Trade Account Service and SLA Management Services will both need configuring. Instructions for this can be found in the following sections of this manual.

3. Trade Account Service

Describing the service, its configuration and usage.

3.1. Overview

Overview of the Trade Account Service.

The purpose of the GRIA Trade Account Service is to support the creation and management of trade accounts. Each trade account represents a trust relationship between the service provider and a user (the account holder), who is responsible for all service access on their account.

The account holder can allow others to charge (i.e. bill) services to their account, and can monitor billed usage to check that it is sensible. The service provider trusts the account holder to manage access responsibly, whether they are an employee leading a collaborative project, or an external user paying for the services provided. The account thus represents the "root of trust" between a service provider and remote users.

The service provider also assigns a credit limit to each account (thus limiting their financial risk exposure), and records any payments or charges on the account. The Account Service keeps track of the account holder's liability, and provides a mechanism to check credit (i.e. whether a bill would take the amount owed above the credit limit). Note that while a credit check would fail if the corresponding amount would exceed the available credit, a bill will still be processed, so the account still records the amount owed to the service provider (see below). The idea is that another service should check credit before committing to provide a service for a user, can go ahead if the credit check is okay, and can always bill afterwards for services delivered.

It is important to emphasise that the trade account service is not a replacement for a banking service. It records money that a customer owes to the service provider, rather than money they actually have. It does not guarantee (the way a bank has to) that these records will be 100% complete and accurate, which is not feasible without reliable networks, as operated (for a fee) by clearing banks. A service provider therefore cannot get money from the accounting service, or treat the balance of an account as cash that can be used to pay others, but can use the account service to generate invoices to send to customers. When the customer pays a bill (via a cheque in the post for instance), the payment should be recorded on the account so as to keep a correct record of the amount owed by the client.

3.2. Configuration

Configuration of the Trade Account Service

There are only two configuration settings for the Account Service:

Default currency
This is the currency code that new accounts will use. By default it is "EUR" for Euro.
Default precision
This is the number of decimal places that new accounts will record.

New accounts will use both of these values. It is not expected that the service provider will change these values after their initial configuration.

Currency

The currency symbol defines the full meaning (and therefore value) of monetary entries in an account. When a service (such as the SLA Service) bills an account, the bill will be rejected if the currency used in the bill is not the same as the currency recorded in the account. There is no automatic conversion between currencies.

Precision

When an account is billed, the monetary value in the bill is rounded (using the half-even rounding method) to the number of decimal places defined in the precision setting. It is up to the manager of the account service to decide how many decimal places to store. If there were many very small charges to an account, then setting too few decimal places could mean that the charges were effectively ignored.

3.3. Management

Management of the Trade Account Service

Account States

Management of the Account Service is mainly a question of dealing with the different states of accounts. An account may be in one of six states, as shown by the boxes in the diagram below. The valid transitions between the six states are shown by labeled arrows:

Account state diagram
pending-credit-checks
When a client request an account, one is automatically created and put into this state. The client is unable to use the account in this state - i.e. other services cannot bill or check the credit limit of the client. The account will appear in the "Accounts awaiting credit checks" section of the Trade Account Service administration page of the web application and a notification will also be put into the Atom feed. The Account Service manager should carry out due diligence checks (e.g. check their credit in the real world), and then use the web page for the new account to either deny the account request, or approve the account by setting a credit limit to a value representing the limit of trust they wish to place in the account applicant.
denied
The denied state exists so as to inform a client that their application for an account was unsuccessful. The Account Service manager may (after a suitable period of time) destroy the account from this state.
open
An open account may be billed and so have payments recorded on it. The Account Service manager may choose to suspend an open account if they want to temporarily prevent the account from being used. The Account Service manager or client may also choose to request that the account be closed.
suspended
In this state, the account cannot be billed. The Account Service manager may unsuspend the account (to re-open it), and the manager or the client can request that the account be closed.
account-usage-finished
In this state, bills and payments may still be recorded on the account. The Account Service manager should ensure that all bills from other services have been recorded and then send a final bill to the client. On receiving final payment, the payment should be recorded and the account balance will be zero. Once the balance is zero, the Account Service manager may move the account into the closed state.
closed
In the closed state, payments and bills cannot be recorded. The account data is left on the system until the manager decides to destroy the account, at which point all record of the account's existence is removed from the Account Service.

Recording Payments

As mentioned above, the Account Service is not supposed to be a bank dealing with real money. Its purpose is to record what a client owes to the service provider. Transactions involving real money would take place using the normal channels (invoices and cheques in the post, credit card transactions on the phone, etc). If a client makes a payment through one of these channels then the payment should be recorded in the account, so that the system knows that the client's over all financial liability has been reduced, and the available credit increased.

record-payment.png

Recording a payment

To record a payment, go to the account's page in the web interface and scroll down to the "Record a payment" section. Enter the details of the payment and click on "Record payment".

Note: the "Recorded by" drop-down is generated from the list of people with the "service-admin" role on this account. To see who has the "service-admin" role, scroll down to the "Access control rules" at the bottom of the page. By default there will be a rule stating that to have the service-admin role it is sufficient to be a "Member of group: account-service-admins". By clicking on "account-service-admins" you will be taken to a new page showing the rules dictating who is in the account-service-admins group.

By default, the account-service-admins group consists of just one rule stating that the service itself is a member. The Account Service manager must add their certificate to this rule set using the "SubjectDN is..." type of rule in order to be able to record payments on any account.

 

4. SLA Management Service

Describing the service, its configuration and usage.

4.1. Introduction

Introduction to the SLA Management Service

Overview

This provides the service manager with the ability to define their available resources (e.g. CPUs, applications, etc), assign portions of their resources to users by the way of service level agreements (SLAs) and bill for resource usage.

The SLA Service is highlighted in the diagram below:

Highlighting the SLA Management Service in the GRIA Architecture

Highlighting the SLA Management Service in the GRIA Architecture

Once someone (the engineer or project manager) at the client organisation has (1) obtained an account and (2) obtained an SLA, the user/engineer at the client organisation may (3) use a functional service that is managed by the SLA Service, which in turn will bill the account for any usage.

The SLA Service manages activities (e.g. data-stagers, jobs, databases) at the functional service, and to a lesser extent, the account service manages the SLA Service. In the following discussion, the functional service is the data service but the same interactions occur with any functional service.

Managing the SLA Service

The SLA Service manager will write and publish "SLA templates" which are offers of service. Using the SLA templates the manager is able to offer many different SLAs with different levels of service at different prices (e.g. "gold", "silver" and "bronze" packages). These represent the service provider's "offers" on terms for using their services. The GRIA access control system can then be used to control which clients are able to see each SLA template. This makes it possible to offer special rates to particular customers, for instance, by creating an SLA template with lower pricing terms, but making it accessible only to those customers.

Using a Managed Functional Service

Before a client is able to use a managed functional service they must have agreed an SLA with the relevant SLA Service. To agree an SLA, the client first retrieves SLA templates that the service provider is offering. The client chooses a template, fills in any fields where they must make a choice and "proposes" it to the service provider. If the service provider agrees with the proposal and has enough spare capacity to honour the agreement, the service provider will agree the SLA and the contract is made. This agreement decision is made automatically by the SLA Service, based on a description of the available resource capacity, taking account of previous SLA agreed.

The SLA between the client and the service provider contains information describing the charges that will be made for various actions and the constraints placed on the client's usage of services.

The client is also able to query the SLA Service directory to retrieve information on their resource usage.

When the client uses the functional service, they quote the ID of their SLA in their request. Through the SLA ID, the functional service is able to be managed by the SLA Service (see below).

How the SLA Service Shares Out Resources

Part of the purpose of an SLA is to give the consumer confidence that they will receive a certain standard of service. For grid services this "standard of service" is often something as basic as a quota of disc space or quantity of CPU time (however, more consumer friendly metrics may be used: see private constraints).

The GRIA SLA Service has a "constraint manager" that deals with making sure that the agreements defined in the SLAs are met. It is possible to replace the constraint manager by a different one with a more advanced algorithm, but the one supplied with GRIA takes a simple conservative approach.

The administrator of the service will define the service's "capacity" and then publish one or more "SLA templates". Both the capacity definition and the SLA templates use constraints to define quantities of resources and the constraint manager must balance one set of constraints against the other. For instance, a capacity constraint may define that there is 10GB of disc space available. If the administrator publishes an SLA template offering up to 1GB of disc space, then the constraint manager will ensure that at most 10 of these SLAs are agreed: guaranteeing that the terms in the SLA can be met. The situation is more complicated when more complicated constraints involving CPU time or contention ratios are used (see further examples of constraints) but the principle is the same.

Management of a Functional Service by the SLA Service

For a functional service to be managed by the SLA Service it must:

  1. query the SLA Service when it is about to perform an action that may potentially involve a lot of usage,
  2. notify the SLA Service of resource usage in a timely manner, and
  3. obey the SLA Service's instructions to destroy activities.

So, for instance, when a client requests a new data-stager, the data service queries the SLA Service to check that the client is allowed another data-stager in their SLA. If the SLA Service were to say that the client was not allowed, then the functional service must obey and not continue. When the client uploads a file to a data-stager, the data service notifies the SLA Service of the disc space and data transfer usage. If the SLA Service discovers that a client has exceeded a constraint in their SLA, then it takes steps (where possible) to bring the usage back within the constraint by instructing functional services to kill a minimum number of the client's activities.

Functional services query new usage requests from the client and report usage of resources using "metrics" such as disc space, data transfer, CPU, database transactions etc. The SLA Service has no concept of what each metric actually represents, it just records their usage and compares the usage against the constraints in the SLAs. As a result, the SLA Service is able to manage new functional services that use highly customised usage metrics with no alteration to the SLA Service itself.

Management of the SLA Service by the Account Service

There is a looser form of management between the SLA Service and the Account Service:

  1. When a client proposes a new SLA, the SLA Service checks with the Account Service that the client's credit is good.
  2. At the end of each SLA billing period, the SLA Service records an aggregated bill (for all the period's usage) on the client's trade account.
  3. If, when billing an account, the SLA Service discovers that the account is suspended or closed, then the SLA is itself suspended.

Configuring, Using and Managing the SLA Service

The following list summarizes the steps taken in configuring, using and managing the SLA Service:

4.2. Metrics

What metrics are and how they are used

The GRIA SLA Management Service is designed to be very flexible. It retrieves usage information from functional services (e.g. job services), records the usage and optionally constrains and/or bills for the usage. Different functional services will want to report usage of different measurable quantities. So for instance, a job service will report usage of CPU but a data service will report usage of disc space. These measurable quantities, henceforth known as "metrics" are represented by URIs. The SLA service does not understand the meaning of these URIs, it just records their usage and acts according to how it has been configured.

Metrics are used in the <constraint> and <pricingTerm> elements of the SLAs and also in the service's capacity configuration.

The use of metrics is recorded in terms of "instantaneous" measurements and "cumulative" usage. The cumulative usage is the integration of the instantaneous measurements over time. For some metrics, data-transfer for example, the instantaneous measurement is best thought of as a rate (bytes per second) and the cumulative usage has no time dimension (bytes). For other metrics, such as CPU, the instantaneous measurement is just the quantity in use at the time (e.g. 3 CPUs) and it is the cumulative usage that has the time dimension, e.g. 180 CPU.seconds. The SLA service can convert between the two, e.g.:

  • If a job runs on 1 CPU for 5 minutes then the SLA service will be notified that the instantaneous measurement of CPU usage went to 1 at the start and then to 0 five minutes later. The SLA service can infer that 300 CPU.seconds of CPU time have been used (1*5*60 = 300 CPU.seconds).
  • If a service reported that it had used 120 units of a resource in a 1 minute period, the SLA service would infer that the average instantaneous measurement (rate of usage) had been 2 units/s.

All metrics have both instantaneous measurements and cumulative usage which may be recorded or inferred from each other. For some metrics one or other concept will not be useful, but the SLA manager has no idea of what it is counting, restricting or billing for in each metric, and so can cope with either type of measurement and can always infer the other from it.

The GRIA job service uses the following metrics:

http://www.gria.org/sla/metric/activity/current-activities
This is set to one when a job is created and to zero when it is destroyed.
http://www.gria.org/sla/metric/activity/job
This is set to one when a job is created and to zero when it is destroyed.
http://www.gria.org/sla/metric/resource/cpu
When a job starts, this is set to the number of CPUs the job is using (normally 1). When the job finishes, it is set to zero again.

The GRIA data service uses the following metrics:

http://www.gria.org/sla/metric/activity/current-activities
This is set to one when a job is created and to zero when it is destroyed.
http://www.gria.org/sla/metric/activity/data-stager
This is set to one when a data-stager is created and to zero when it is destroyed.
http://www.gria.org/sla/metric/resource/disc
When data is stored in a data-stager, the disc space usage is recorded by setting this to the file size.
http://www.gria.org/sla/metric/resource/data-transfer
When data is transfered to or from a data-stager, the transfer is recorded by setting the cumulative usage of this metric to the file size.

The documentation of other services should specify the metrics that they generate.

4.3. Configuration

How to configure the SLA Management Service

4.3.1. Links with Other Services

How to link the SLA Management Service with the Trade Account Service and Functional Services

Trusted Account Service

The SLA service will record each user's usage of the functional services (e.g. data service, job service). This usage may optionally be converted into a monetary charge according to the pricing terms in the SLA. Therefore, the SLA service may be required to bill users' accounts. The first configuration to make on the SLA service page is the "Trusted management service". This is a list of account services that the client may use with the SLA service. Normally this will just be the TradeAccountService at the same service provider. By default, the TradeAccountService is filled in ready to be used - just press the "Add" button to use it. Alternatively, the "Make service free" button sets the SLA service up so that clients can have SLAs that are never billed - effectively making the SLAs free.

Managed Services

The SLA service is of no use if it does not manage any functional services. There are five conditions that must be met in order for a functional service to be managed:

  • At the functional service:
    1. The SLA service must be in the functional service's list of "Trusted Management Services".
    2. The SLA service's CA certificate must be a trusted CA in the functional service's key-store.
    3. The SLA service must be in the "management-services" group of the functional service.
  • At the SLA service:
    1. The functional service's CA certificate must be a trusted CA in the SLA service's key-store.
    2. The functional service must be in the "sla-managed-services" group of the SLA service.

If the functional services that require managing are hosted on the same machine as the SLA service (and therefore share the same certificate) then steps 2-5 are not necessary, otherwise see below for more detail.

Update Access Control

The access control rules for the application services must be updated to allow the managing service (e.g. SLA service manager) to be able to monitor resource usage and invoke management actions on the application service in response. Furthermore, the access control rules for the management package must be updated to allow the managed application services to be able to check that activity and resource creation and further usage is acceptable.

To modify access control rules, first open the main GRIA administration page for the package concerned.

GRIA Application Services

Open the web page for either the Job Service or Data Service in your browser and follow the instructions given under the "Trusted Management Service" heading. Following the instructions for one service will also have the effect of correctly configuring the other service as the Job and Data services share key-store and group definitions.

When it comes to adding a new rule in the "management-services" group table. Choose the "SubjectDN is..." rule and upload the Subject and Issuer certificate for the server hosting the GRIA management service package when prompted.

GRIA Management Services

Open the SLA Service page in your browser and follow the steps in the instructions under "Managed Services", that is:

  1. Add the functional service's CA certificate to the management service's key-store as a trusted CA.
  2. Add a new rule to the "sla-managed-services" group:
    • The new rule is to allow the functional service (e.g. data service) to communicate with the SLA service.
    • Click on the "sla-managed-services" link in the SLA web page.
    • Add a new rule specific to the functional service. This will be a "SubjectDN is..." rule. You will be asked to upload the Subject and Issuer certificate for the server hosting the GRIA basic applications package.

4.3.2. Metric Editor

How to use the metric editor

Initially the SLA service does not know about any metrics. As the service discovers metrics through capacity or SLA template files being uploaded, or by receiving usage reports from functional services, the list of metrics in the service's administration page is added to. When a metric is discovered by the SLA service in one of these ways it is stored and its definition (in terms of the human-readable fields) will not be changed by subsequent events. Sometimes the initial metric definition needs to be changed, and for this purpose the "Edit" button is provided. The form also provides a "Delete" button for metrics that are no longer required (though a metric may of course be rediscovered by the service), and a "New metric" button for defining a metric from scratch.

Screenshot of the metric editor

Fields in a Metric

As discussed in the section describing GRIA's metrics, metrics are recorded and described using instantaneous and cumulative measurements. If an instantaneous measurement of the metric is being made then the instantaneous description and instantaneous units would be used to describe the measurement. Similarly the cumulative description and units would be used to describe a measurement of the metric made over time. Two other fields deserve explanation:

Type
Either a "Resource" or "Activity". A resource is a metric that is defining real resource usage such as disc space or CPU. An activity is a special metric that defines the type of the activity as a whole.
Unit type
"Decimal" or "Binary". This defines how client software should abbreviate large values of this metric to e.g. disc space is defined as a binary unit so may be displayed as 1.5KiB (1546B), by dividing by 1024 rather than by 1000 as it would for a decimal unit.

Using the Editor

The metric editor is quite complicated because of the large number of fields that are provided to describe the metric. A lot of the time most of the fields can be left disabled and the system will automatically generate sensible defaults. For instance, in the screenshot above, the description has been set to "CPU" (overiding the default of "cpu" inferred from the URI) but the plural description field has not been enabled as the default of "CPUs" is correct.

When defining or editing a metric, it is best to start at the top of the form and work downwards as the default values for later fields come from the earlier fields. Each input field in the form is coloured to help understanding. The defaults shown on the right hand side are generated by taking a field and adding further text. So for instance, in the screenshot, the default for the "Instantaneous description" is displayed as "number of CPUs". This shows that the "CPU" text has come from the "Description" field (coloured orange). The preview section at the bottom of the editor gives two complete examples of how a constraint using this metric would be described.

Registering Metrics

It can sometimes be useful to define several metrics at once (for instance in a new SLA Service installation). A metric upload form is provided for this purpose. The form reads in an XML file containing metric definitions and registers each metric with the system, making them available for use in the SLA template editor for instance. If a metric is already known to the system then its description will be overwritten by the description in the XML file only if it has not previously been described (i.e. only the URL is defined). A sample file is linked to in the form.

Please note that it is not necessary to register metric in this way for the service to work - it is just a convenience. Metrics will be registered automatically as they are reported from new application services.

4.3.3. Capacity Management

How to define the resources available at the service provider

The service provider may optionally define the size/quantity of the resources they wish to provide: their "capacity". New SLAs will then be agreed up to the point where the entire capacity has been allocated. This is most obviously necessary for disc space, where a service provider has a finite amount of disc space and will probably want to guarantee each of its customers a certain amount. If the service provider does not define the capacity then there will be no limit on the number of SLAs allocated to customers. This will in turn have an impact on the service provider's ability to fulfill resource commitments defined in their SLAs.

There are two ways to define the capacity:

  1. Upload an XML file defining the capacity.
  2. Use the capacity editor in the web page.

Uploading a Capacity Definition

A sample XML capacity definition is provided with the application. It can be downloaded from here or from the administration web page. The sample file defines the capacity to be 10 CPUs and 2GiB of disc space. This may be changed and extended according to the specific requirements of the service provider. The <constraint> elements in the file follow the same schema as those in the SLA template (described here), but must all be instantaneous indefinite constraints.

It is important to note that uploading a capacity definition will completely overwrite any existing definition.

Using the Capacity Editor

The capacity definition may be defined from scratch or edited using the built in capacity editor. Each constraint can be edited or deleted. When the service is first installed, no metrics are defined and so no capacity constraints can be defined either.

4.3.4. SLA Templates

How to configure SLA templates

Overview

The SLA template controls which services the client may use, the quantity of resources on those services they may use and how much and how often they will be charged. A service provider may define and publish many different SLA templates. Templates may be tailored to individual customers and access to the templates restricted accordingly, or several different general templates may be defined with a variety of level of service and price to give the customer some choice.

The SLA templates may be defined and managed in the main service administration page as shown below.

Screenshot of the SLA templates table

An SLA template can be in one of two states:

draft
In this state the template is not visible to anyone but the service administrator. It can be edited and published.
published
In this state the template's visibility is defined by its access control rules. It cannot be edited any more.

In either state, a template may have its access permissions set, may be copied (creating a new draft template), exported and deleted.

Parts of an SLA Template

SLA templates may be defined either using the built in editor or using an XML file. An annotated sample SLA template is provided, parts of which are reproduced and discussed below. This template may be edited and then added to the SLA service using the "Add New SLA Template from file" facility. Once a template is uploaded, it can be viewed and edited by clicking on the template ID in the main SLA service web page.

Label and Description

The SLA template starts with a <label> and <description> tag that are used in both the client and server interfaces to identify the SLA template to users.

Billing Period

The SLA service aggregates usage of the functional services and periodically calculates a bill and charges the account that the SLA was proposed under. The billing period defines how often the account is charged.

    <billingPeriod>
<years>0</years>
<months>0</months>
<days>1</days>
<hours>0</hours>
<minutes>0</minutes>
<seconds>0</seconds>
</billingPeriod>

The billing period is defined in terms of <years>, <months>, <days>, <hours>, <minutes> and <seconds> as shown above. The first bill for usage will occur one period after the SLA is agreed, or after the previous bill thereafter (i.e. every 1 day in this example).

Signing Fee, Subscription Fee and Currency

The signing fee is a charge made to the account when the SLA is agreed. The subscription fee is charged at the end of every billing period in addition to charges made for the resources used. The currency definition applies to all prices in the SLA.

    <signingFee>10.00</signingFee>
<subscriptionFee>10.00</subscriptionFee>
<currency>EUR</currency>

Validity Period

An SLA template is valid for a restricted period of time, outside of which a proposal from the client to use the SLA template will be rejected (e.g. "Special offer! This month only!"). The validity period is defined by a <startTime> and <endTime> element:

    <startTime>
<year>2000</year>
<month>1</month>
<dayOfMonth>1</dayOfMonth>
</startTime>

<endTime>
<year>2010</year>
<month>1</month>
<dayOfMonth>1</dayOfMonth>
</endTime>

Permitted Services

A Service provider may want to only provide certain functional services to certain customers. For this reason, the SLA template may contain a list of permitted services. For convenience however, if the list of permitted services is empty then all functional services in the "sla-managed-services" group may be used with the SLA.

    <permittedServices>
<permittedService>
<url>http://example.com:443/gria-basic-app-services/services/JobService</url>
</permittedService>
</permittedServices>

Metrics

Metrics are used in the constraint and pricing term definitions (below). The concept of metrics is discussed earlier in the manual and further information is available in the metric editor page of this configuration section. Shown here is one exmaple of a metric represented in XML. Further examples may be seen in the supplied SLA template file.

    <metric type='RESOURCE'>
<uri>http://www.gria.org/sla/metric/resource/cpu</uri>
<description>
<description>CPU</description>
<plural>CPUs</plural>
<instantaneous>number of CPUs</instantaneous>
<cumulative>CPU time</cumulative>
</description>
<units type='DECIMAL'>
<instantaneous>CPU</instantaneous>
<cumulative>CPU.s</cumulative>
</units>
</metric>

Constraints

Each constraint element in the SLA template constrains the instantaneous value or cumulative usage of a metric. There are two sorts of constraint: indefinite and periodic. An indefinite constraint applies over the whole time of the SLA, from the start to now. A periodic constraint constrains the usage of a metric within a repeating time period. The later may be used to permit a certain amount of usage per day or month for instance.

    <constraint type='INSTANTANEOUS'>
<metric>
<uri>http://www.gria.org/sla/metric/resource/cpu</uri>
...etc...
</metric>
<bound>LE</bound>
<private>false</private>
<limit>1.0</limit>
<contention>1.0</contention>
<repeating>false</repeating>
</constraint>

This constraint constrains the instantaneous measurement of CPU to be less than or equal to 1.0 at any one time. Or to put it another way the number of CPUs ≤ 1.

<bound>
This may take the value "LT" or "LE". These stand for "less than" and "less than or equal to" respectively (the same as the Fortran operators of the same name). If the above example constraint was changed to restrict the rate to < 1.0 (using "LT") then no jobs at all could be run.
<private>
This element may take the value of "true" or "false". If it is set to "true" then the constraint will be hidden from the user. See the discussion of private constraints below.
<limit>
Limit is the value we are constraining to.
<contention>
This element is only applicable to INSTANTANEOUS constraints and defines how many users are subject to the same constraint on the same resource (i.e. share the resource), with the limit giving the maximum available to any or all of them at any instant. For most purposes this "contention ratio" may be set to 1.0 but it is very useful to set higher values for some types of usage. For example, a service provider may have a cluster of 10 CPUs along with a queuing system. The constraint in the SLA could limit the instantaneous CPU number to ≤ 10 CPUs with a contention of 20. Twenty of these SLAs could be sold, and each client would know that there were 19 other users contending for the cluster and that they could expect to get just 0.5 CPU on average over a long enough period.
<repeating>
This element takes the value of "true" or "false". If it is false then the constraint is an "indefinite constraint" and applies for all time. For "true" see below.
    <constraint type='CUMULATIVE'>
<metric>
<uri>http://www.gria.org/sla/metric/resource/cpu</uri>
...etc...
</metric>
<bound>LE</bound>
<private>false</private>
<limit>1000.0</limit>
<contention>1.0</contention>
<repeating>true</repeating>
<duration>
<years>0</years>
<months>0</months>
<days>1</days>
<hours>0</hours>
<minutes>0</minutes>
<seconds>0</seconds>
</duration>
</constraint>

This constraint constrains the CUMULATIVE usage of CPU to just 1000 CPU.seconds every day. In other words, the CPU time is ≤ 1000.0 CPU.s.

<repeating>
When this is "true" the constraint is a "periodic constraint" and a duration for the repeating period is also required.
<duration>
The duration specifies the length of the repeating period in terms of <years>, <months>, <days>, <hours>, <minutes> and <seconds> as shown above. The first period begins as soon as the SLA is agreed.

Private Constraints

In some situations a client will want to deal with a service provider in "business terms", that is what the client wants to get done, rather than "technical terms", or what resources are required to do the job. For instance, a client who requires frames of a video to be rendered may prefer to have an SLA that specified how many frames of high-definition video they were permitted to render per day, rather than an SLA that described how many CPUs and how much memory etc was on offer. To support this scenario in GRIA, private constraints may be defined.

In this example the service provider might define three constraints and a pricing term:

  1. A public constraint saying "number of frames <= 100 per day".
  2. A private constraint saying "CPU time <= 50 hours per day".
  3. A private constraint saying "Disc space <= 100GB".
  4. A pricing term saying "€0.50 per frame".

The service provider has had to make a judgement about what basic resources (CPU, disc) are required to fulfill the offer of 100 frames of video per day. All three constraints then go into the SLA template and the template is published. When the client views the SLA template though the private constraints are removed so all the client sees is the information they are interested in, namely how many frames they will be able to render (and how much it will cost). The SLA manager will pay attention to all the constraints though and will ensure that the service provider has sufficient capacity to meet the CPU and disc space constraints.

Further Examples of Constraints

It is easy to get confused about what different constraints mean and what the service is guaranteeing (or not), especially for metrics like CPU. Let us imagine that we have a service with a capacity constraint set to 10 CPUs. Given this, the service must share out the CPUs between competing users, each with CPU constraints in their SLAs.

If the constraint in the SLA says that you can have ≤1 CPU then upon agreeing the SLA the service will record than 1 CPU is permanently reserved and will let you have up to 1 CPU all of the time. The rest of the time the CPU would be left idle. The service could sell only 10 of these SLAs (to match its 10 CPUs).

If the constraint in the SLA says you can have ≤5 CPUs with a contention of 10 then the service will record that 0.5 CPUs are used up (5 CPUs shared between 10 is effectively 0.5 CPU each on average) and will let you have up to 5 CPUs any time you request a job. The service could sell 20 of these SLAs. If the service was sold up to its capacity then on average you will actually be able to use your 5 CPUs 1/10th of the time (10 being the contention ratio). Jobs for all users would enter the resource manager's queue and execute when the CPUs were available.

If the constraint in the SLA said you can have 1 day's worth of CPU seconds every month then the service will record that approximately 1/30th of a CPU has been allocated and could sell 30 of these SLAs. If you had a job that lasted a day and added it to the queue at the start of the month then you would get it executed by the end of the month, but if you put it in the queue near the end of the month it might not get run by the end of the month if the queue was full.

Other examples of constraints are given in the XML annotated example.

Pricing Terms

At the end of each billing period, the usage of an SLA is examined and a bill is calculated. Charges may be made for the cumulative usage of a metric (integral of measurement over time) or for the increase in the measurement of a metric over the billing period.

    <pricingTerm type='CUMULATIVE'>
<description>standard rate</description>
<lowerBound>300</lowerBound>
<upperBound>-1</upperBound>
<price>0.01</price>
<currency>EUR</currency>
<metric>
<uri>http://www.gria.org/sla/metric/resource/cpu</uri>
...etc...
</metric>
</pricingTerm>

This pricing term charges 0.01 EUR for every second of CPU time used above 300 CPU.seconds. That is, 305 CPU.seconds costs 0.05 EUR.

<lowerBound> and <upperBound>
The lower and upper bound specify the usage interval that the pricing term applies to. The interval is closed at the lower end and open at the upper end. If the upperBound is set to -1 then it is taken as meaning +infinity.
<price> and <currency>
The price and currency define the charge that will be multiplied by the usage in the interval.
<metric>
The metric that this pricing term applies to.
    <pricingTerm type='INSTANTANEOUS_INCREASE'>
<description>charge</description>
<lowerBound>0</lowerBound>
<upperBound>-1</upperBound>
<price>0.10</price>
<currency>EUR</currency>
<metric>
<uri>http://www.gria.org/sla/metric/resource/job</uri>
...etc...
</metric>
</pricingTerm>

This pricing term is of type INSTANTANEOUS_INCREASE and will cause a charge of 0.10 EUR for every job that is created.

The INSTANTANEOUS_INCREASE type of pricingTerm adds up all the increases in the instantaneous measurement of a metric over the billing period. It does not look at the difference between the start and the end of the billing period. As the number on the job metric increases by one every time a job is created, this can be used to charge for the number of jobs created during the billing period.

More pricing term examples may be found in the annotated example.

Defining and Editing SLA Templates

By clicking on the "New SLA template..." button or the "Edit" button of a draft SLA template you are taken to the SLA template editor page. Here using the simple interface all aspects of an SLA template including its access control rules may be defined or changed. If you have a fresh installation of the service and no metrics are defined, it will not be possible to define any constraints or pricing terms.

Publishing SLA Templates

When a Service Manager wants to publish an SLA Template they must click the "Publish" button next to the template. This action changes the state of the SLA Template from "draft" to "published" and gives access to the template to any person with the "client" role.

Setting Access Permissions on SLA Templates

When an SLA Template is initially uploaded to the SLA Service it is in the "draft" state. This means only the Service manager (who has the "owner" role) has access to it. Once the SLA template is published, it enters the "published" state and users with the owner and client roles can access it.

By default, anyone is accepted as having the client role, so anyone will be able to see a published SLA template. To restrict access to an SLA template the definition of who a "client" is must be changed. This is done by clicking on the template ID in the main SLA service web page to get to the template details page. Scrolling to the bottom reveals the access control rules for that template:

Access control rules for an SLA template

The first rule in the list says that to have the client role it is sufficient to have any subject distinguished name in your certificate and for your certificate to be signed by anyone. To remove this rule, click on the "Delete" button by the rule. A new rule must then be added to redefine who a "client" is. If the SLA template was intended to be seen by just one company, then the new rule would probably be one saying that to be a client, it is sufficient to have your certificate signed by the company's CA. This may be achieved by selecting "client", "sufficient" and "Certificate is signed by..." in the form, pressing "Add" and then uploading the relevant CA certificate.

For more information on how to configure access rules, please see the PBAC 2 manual.

Exporting an SLA Template

Clicking on the "Export" button in the SLA template table will create an XML definition of the template for downloading. Unfortunately, this XML definition is in a slightly different format to the one described above. It may still be used for uploading to another service but is not very easy to edit.

4.4. Monitoring

How to monitor activity in the SLA Management Service

The SLA service collects a lot of information about the usage of the functional services. A great deal of this information is available from the SLA service web application. To view the information you must have JavaScript enabled in your web browser.

Periodically, the SLA service polls the functional services to collect information on what resources have been used. It records the usage against the relevant activity, records the usage against the SLA the activity belongs to and records the usage against the service as a whole. Data about the usage of all the metrics can be displayed for activities, SLAs and the service.

By default, the SLA service polls the functional service for usage reports every two minutes. This means that information on usage does not appear in the SLA service web application immediately.

If a client complains that they were unable to get a new SLA

When a client proposes an SLA, they may receive an error saying "Insufficient resources to create this SLA". This is caused by the service running out of capacity. To see a comparison between the total capacity and the amount of resource allocated (sold as promises in SLAs), the "Service Usage" page is provided. The service usage page may be found by clicking on the "Service Usage" link in the SLA Service admin page of the GRIA web application.

The service usage page shows information aggregated across all SLAs. All known metrics are displayed and as new metrics become known they are automatically added. Where relevant, bar charts summarising the data are displayed:

This chart shows that the total capacity of the service is 10 CPUs, a total of 1 CPU has been allocated and no CPUs are currently in use. By clicking on the "Raw allocation data" and "Raw usage data" links, tables showing the raw data may be displayed.

Currently the data tables show the data for all time. This will be improved in a future release.

If a client complains that they were unable to run a job or upload a file

When a client tries to use some resources they may get an error telling them that to do so would have breached a constraint in their SLA. The usage in an SLA may be examined with respect to the constraints by first clicking on the relevant SLA ID in the "Active SLAs" list of the main SLA service page. Once at the "SLA Details" page, the "View usage of these constrained metrics" link must be followed.

Each constraint will be shown along with the usage of the constraint's metric. If the constraint is a periodic constraint then the usage in the constraint's current period will be used. If the constraint is an indefinite constraint then just the usage in the last 24 hours is shown. As with the SLA service usage page, the raw data may be revealed by clicking on the "Raw usage data" links.

If a client has a query regarding costs incurred

A second breakdown of SLA usage is provided where the use of metrics is compared with the SLA's pricing terms. From the "SLA Details" page for the relevant SLA, clicking on "View usage of these priced metrics" takes you to this information. Each priced metric is displayed along with its pricing term(s) and the usage in the billing period. The raw data may be revealed by clicking on the "Raw usage data" links.

This page only shows the current billing period. It will be possible to select the billing period to display in a future release.

Further Information

It is possible to see the information that the SLA service has gathered about any activity in an SLA. On the "SLA Details" page there is a link to the "activities page" for the SLA. All data-stagers, jobs etc are listed on this page along with some further information about each activity. Each activity has a summary page showing all the usage recorded on the activity.

4.5. Managing

Management actions in the SLA Management Service

Many management actions, such as adding new SLA templates or new functional services, will involve changes to the configuration. There are a few other cases where the service manager would have to intervene.

Suspending, Resuming and Closing SLAs

An SLA has four states: active, suspended, closing and closed. The transitions between the states are shown in the diagram below:

State diagram for the SLA

The service manager may sometimes want to change the state of an SLA. To understand the state model, the states are described here:

Active
This is the normal state, entered into when an SLA is created. The active state is also entered if the service manager clicks on the "Resume" button in the web application by a suspended SLA. When the SLA is active, the SLA service will process requests from functional services to start new actions against the SLA's usage constraint terms.
Suspended
This state is entered if the SLA service finds it is unable to bill the account service for the SLA. This would occur at the end of a billing period if the SLA's account was suspended or closed. The state may also be entered into if the service manager clicks on the "Suspend" button by the SLA in the web application. When an SLA is in the suspended state, no new actions may be performed that would result in resource usage.
Closing
The closing state is entered when either the client chooses "Close SLA" on an SLA in the client, or when the service manager clicks on the "Close" button by the SLA in the web application. When an SLA enters the closing state, all running activities are signaled to destroy (so, for instance, all data in data stagers would be lost). The SLA remains in this state until all activities have finished and the final bill has been sent. No new activities may be started in this state. If really necessary, the "Force close" action may be used. Using this button puts the SLA into the "Closed" state and stops the SLA service from managing all the SLA's remaining activities but does not try to destroy the activities. The account is also not billed (unbilled charges can be seen on an SLA's page).
Closed
The closed state is entered when the final bill has successfully been sent to the account service and there are no more running activities in the SLA. No new activities may be started in this state.

Stopping Activities and Services From Being Managed

There are some unusual situations when you may need to intervene to tell the SLA service to stop managing an activity. As noted above, an SLA will remain in the closing state until it determines that all activities in the SLA have finished. Occasionally, the message indicating that an activity has finished may be lost (for instance, if the server running the functional service crashes). If this happened, then the SLA would never reach the closed state. For this emergency use only, a button is provided by each activity in the activity list for an SLA labelled "Stop Managing". Pressing this button will force the SLA service to behave as though it had received a message saying that the activity had finished. Pressing the "Stop Managing" button does not communicate with the functional service owning the activity - it does not destroy the resource. It just means the SLA Service will no longer take any responsibility for managing the functional service activity against the SLA terms.

In addition, the "Managed Services" page (see the end of the main SLA service administration page) shows which functional services are currently being managed. Services on this list come and go automatically: they appear when a functional service tells the SLA service to manage a new activity and are removed when there are no more activities left to manage on a service. While a service is on the list, the SLA service periodically polls it to gather usage information. Clicking on the functional service in this list displays all activities that the SLA Service is managing at a functional service (e.g. all jobs and data).

If a functional service fails then the SLA Service will continue trying to gather usage information. As a last resort it is possible to tell the SLA Service to "Stop managing" a functional service. This forces the SLA Service will finish all the activities it is managing at the service (and therefore stop polling it).