4.2.
Metrics
Up one level
The GRIA SLA Management Service is designed to be very flexible. It retrieves usage information from functional services (e.g. job services), records the usage and optionally constrains and/or bills for the usage. Different functional services will want to report usage of different measurable quantities. So for instance, a job service will report usage of CPU but a data service will report usage of disc space. These measurable quantities, henceforth known as "metrics" are represented by URIs. The SLA service does not understand the meaning of these URIs, it just records their usage and acts according to how it has been configured.
Metrics are used in the <constraint> and <pricingTerm> elements of the SLAs and also in the service's capacity configuration.
The use of metrics is recorded in terms of "instantaneous" measurements and "cumulative" usage. The cumulative usage is the integration of the instantaneous measurements over time. For some metrics, data-transfer for example, the instantaneous measurement is best thought of as a rate (bytes per second) and the cumulative usage has no time dimension (bytes). For other metrics, such as CPU, the instantaneous measurement is just the quantity in use at the time (e.g. 3 CPUs) and it is the cumulative usage that has the time dimension, e.g. 180 CPU.seconds. The SLA service can convert between the two, e.g.:
- If a job runs on 1 CPU for 5 minutes then the SLA service will be notified that the instantaneous measurement of CPU usage went to 1 at the start and then to 0 five minutes later. The SLA service can infer that 300 CPU.seconds of CPU time have been used (1*5*60 = 300 CPU.seconds).
- If a service reported that it had used 120 units of a resource in a 1 minute period, the SLA service would infer that the average instantaneous measurement (rate of usage) had been 2 units/s.
All metrics have both instantaneous measurements and cumulative usage which may be recorded or inferred from each other. For some metrics one or other concept will not be useful, but the SLA manager has no idea of what it is counting, restricting or billing for in each metric, and so can cope with either type of measurement and can always infer the other from it.
The GRIA job service uses the following metrics:
- http://www.gria.org/sla/metric/activity/current-activities
- This is set to one when a job is created and to zero when it is destroyed.
- http://www.gria.org/sla/metric/activity/job
- This is set to one when a job is created and to zero when it is destroyed.
- http://www.gria.org/sla/metric/resource/cpu
- When a job starts, this is set to the number of CPUs the job is using (normally 1). When the job finishes, it is set to zero again.
The GRIA data service uses the following metrics:
- http://www.gria.org/sla/metric/activity/current-activities
- This is set to one when a job is created and to zero when it is destroyed.
- http://www.gria.org/sla/metric/activity/data-stager
- This is set to one when a data-stager is created and to zero when it is destroyed.
- http://www.gria.org/sla/metric/resource/disc
- When data is stored in a data-stager, the disc space usage is recorded by setting this to the file size.
- http://www.gria.org/sla/metric/resource/data-transfer
- When data is transfered to or from a data-stager, the transfer is recorded by setting the cumulative usage of this metric to the file size.
The documentation of other services should specify the metrics that they generate.
