Personal tools

4.4. Monitoring

Up one level
How to monitor activity in the SLA Management Service

The SLA service collects a lot of information about the usage of the functional services. A great deal of this information is available from the SLA service web application. To view the information you must have JavaScript enabled in your web browser.

Periodically, the SLA service polls the functional services to collect information on what resources have been used. It records the usage against the relevant activity, records the usage against the SLA the activity belongs to and records the usage against the service as a whole. Data about the usage of all the metrics can be displayed for activities, SLAs and the service.

By default, the SLA service polls the functional service for usage reports every two minutes. This means that information on usage does not appear in the SLA service web application immediately.

If a client complains that they were unable to get a new SLA

When a client proposes an SLA, they may receive an error saying "Insufficient resources to create this SLA". This is caused by the service running out of capacity. To see a comparison between the total capacity and the amount of resource allocated (sold as promises in SLAs), the "Service Usage" page is provided. The service usage page may be found by clicking on the "Service Usage" link in the SLA Service admin page of the GRIA web application.

The service usage page shows information aggregated across all SLAs. All known metrics are displayed and as new metrics become known they are automatically added. Where relevant, bar charts summarising the data are displayed:

This chart shows that the total capacity of the service is 10 CPUs, a total of 1 CPU has been allocated and no CPUs are currently in use. By clicking on the "Raw allocation data" and "Raw usage data" links, tables showing the raw data may be displayed.

Currently the data tables show the data for all time. This will be improved in a future release.

If a client complains that they were unable to run a job or upload a file

When a client tries to use some resources they may get an error telling them that to do so would have breached a constraint in their SLA. The usage in an SLA may be examined with respect to the constraints by first clicking on the relevant SLA ID in the "Active SLAs" list of the main SLA service page. Once at the "SLA Details" page, the "View usage of these constrained metrics" link must be followed.

Each constraint will be shown along with the usage of the constraint's metric. If the constraint is a periodic constraint then the usage in the constraint's current period will be used. If the constraint is an indefinite constraint then just the usage in the last 24 hours is shown. As with the SLA service usage page, the raw data may be revealed by clicking on the "Raw usage data" links.

If a client has a query regarding costs incurred

A second breakdown of SLA usage is provided where the use of metrics is compared with the SLA's pricing terms. From the "SLA Details" page for the relevant SLA, clicking on "View usage of these priced metrics" takes you to this information. Each priced metric is displayed along with its pricing term(s) and the usage in the billing period. The raw data may be revealed by clicking on the "Raw usage data" links.

This page only shows the current billing period. It will be possible to select the billing period to display in a future release.

Further Information

It is possible to see the information that the SLA service has gathered about any activity in an SLA. On the "SLA Details" page there is a link to the "activities page" for the SLA. All data-stagers, jobs etc are listed on this page along with some further information about each activity. Each activity has a summary page showing all the usage recorded on the activity.