2008/09 Assessment of Clearing and Settlement Facilities in Australia 6. Special Topic: Operational Risk Management
Measure 9 of the Financial Stability Standard for Central Counterparties and the equivalent Measure 7 of the Financial Stability Standard for Securities Settlement Facilities set out the relevant requirements for licensed CS facilities in the management and control of operational risk. These measures require that the licensee as operator of a facility identify sources of operational risk and minimise these through the development of appropriate systems, controls and procedures.
These measures elaborate on this high-level requirement under four broad headings:
- Security and operational reliability;
- Business continuity procedures;
- Outsourcing; and
- External administration of a related body.
Each of these aspects is assessed by reference to further requirements set out under the measures themselves or in the supporting guidance.
In this assessment period, the Reserve Bank undertook a detailed assessment against the operational risk measure for all four licensed CS facilities. While each facility's operational risk-management arrangements are consistent with the guidance for this measure, the Reserve Bank encourages ASX to keep its arrangements under review to ensure that they continue to meet evolving best practice in this area. The Reserve Bank will also continue to monitor implementation of enhancements to operational risk-management processes recommended by internal and external auditors, and some specific changes at Austraclear introduced in response to an operational outage in March 2009.
This section first describes the overarching framework for operational risk management in the ASX group, before identifying the key findings for each of the four elements outlined above. Since all four licensed facilities are part of the same corporate group, a common operational risk-management policy is applied. In what follows, therefore, the four facilities are treated collectively, unless stated otherwise.
Risk-management Framework
ASX's operational risk policies and controls have been developed within a group-wide risk framework. The broad framework is set out in an Enterprise Risk Management Policy, with responsibilities in respect of operational risk management delegated as follows:
- The ASX Limited Board is responsible for approving and reviewing high-level operational risk policy.
- The Board delegates certain activities to an Audit and Risk Committee. In particular, this Committee oversees the application of the Board's policy.
- An Enterprise Risk Management Committee, comprising executives from across the business units, is responsible for implementing Board-approved risk-management policy and developing controls, processes and procedures to identify and manage risks. This Committee is also responsible for formally approving significant operational risk policies prepared by individual business units.
- Individual business units are responsible for: identifying business-specific risks; applying controls; maintaining risk-management systems; reporting on the effectiveness of risk controls; and implementing enhancements and taking remedial action, as appropriate. Each business unit is required to maintain a record of its risk profile, reviewing this on a six-monthly basis and updating as appropriate. This record includes ‘key risk indicators’ and action plans to address any identified risk that is not adequately mitigated. Policies are formally reviewed every 18 months to three years. More frequent reviews may take place depending on potential changes to technology, legal or regulatory requirements, or business drivers.
Assessment against the Operational Risk Measure
(i) Security and operational reliability
This aspect of the measure covers the security, operational reliability and capacity of a CS facility's key systems. Technical change-management processes and the experience and expertise of relevant key personnel are also considered in this context.
In the case of ASX's clearing and settlement operations, the key systems are the following:
- CHESS – the system supporting central counterparty services and securities settlement services for cash equity products;
- DCS – the key system supporting ACH's central counterparty services in the derivatives market;
- SECUR – the system supporting SFECC's central counterparty services for the SFE market; and
- EXIGO – the settlement engine underpinning Austraclear's settlement service for fixed income products.
Key findings under this aspect of the measure for these systems are detailed below.
- (a) Key systems, such as computer and communication systems, are secure, reliable and have robust access controls, with security reviewed and tested periodically.
The key systems supporting ASX's clearing and settlement processes are operated within a secure building. Physical access is controlled at both an enterprise and business-unit level and arrangements are independently tested on an ad hoc basis. Clearing operations are separated from general office areas with permitted access determined at a senior-manager level and records of access maintained. Physical security arrangements for the backup site are broadly equivalent.
User access for the key systems is restricted to prevent inappropriate or unauthorised access to application software, operating systems and underlying data. The level of access is authorised by the system owner with users granted the minimum level of access to systems necessary to perform their roles effectively. External access to ASX systems must pass through one or more layers of firewalls and intrusion prevention. Individual networks are segregated.
The process to request access to systems is documented, monitored and formally audited. User activities are uniquely identifiable and can be tracked via audit-trail reports. A re-validation process is also conducted periodically to confirm user access and privileges. ASX made changes to its access arrangements during the year as a result of an outage to EXIGO (see Section 5.4), which arose when a system support staff member was simultaneously connected to both the live and test environments. Steps have since been taken to physically separate the test and production systems.
Technology-security policy is considered by external auditors in the context of their reviews, which take place twice a year. Internal audit also routinely monitors compliance with such policy, reporting to the Audit and Risk Committee (and CEO) on a quarterly basis. Audit findings may prompt a review of policy, which would be conducted in consultation with key stakeholders.
Testing of technology-security policy is carried out against production infrastructure where possible. This includes penetration testing against the ASX perimeter and vulnerability testing within the perimeter. Application-level testing is carried out in test environments. Technology-security testing reports are documented, with identified problems escalated to management and tracked through to remediation. Similarly, any technology-based operational incidents are reported to senior management and issues are tracked through to resolution via regular updates.
- (b) Key systems are operationally reliable, with standards of operational reliability defined formally and documented.
Operational processes are documented and supported by internal procedures (eg, checklists and audit logs). Dual input checks, management sign-off and processing checklists are the primary preventative controls, supported by reconciliations and management reviews of activity.
The design and effectiveness of the control procedures supporting the core operational and system processes are subject to regular independent external audit and internal audit. Any deviations from internal control procedures (eg, extensions to scheduled times, transaction cancellations, etc) are recorded, reported and, as required, actioned and resolved.
Quality assurance for critical hardware and software is achieved via pre-release testing and fault monitoring. This includes both functional and non-functional testing, regression testing, and commissioning of new/changed capabilities.
Critical IT infrastructure is designed to ensure resilience against component failure. There is full redundancy at the primary site, with any single points of failure identified and processes developed to ensure that recovery can occur. Any additional procedures required are recorded in the system support and recovery documentation.
Availability targets are documented and defined formally for critical services (a minimum target of 99.8 per cent). In the case of Austraclear, a ‘Step-in and Service Agreement’ established with the Reserve Bank demands a slightly higher target for system availability. This agreement reflects the interdependence between Austraclear and the Reserve Bank's high-value payments system, RITS. Actual system availability by system is shown in Table 1.
System | Average availability |
Average capacity utilisation |
Peak capacity utilisation |
---|---|---|---|
DCS | 100.0 | 22 | 44 |
CHESS | 100.0 | 35 | 67 |
SECUR | 100.0 | 25 | 43 |
EXIGO | 99.91 | 30 | 58 |
Should an infrastructure failure nevertheless occur at the primary site, failover to the back- up site is targeted to occur within one hour for all systems, allowing for up to two hours in the event that there is also an application and/or data problem.[1]
Where incidents do occur they are prioritised as high, medium or low, according to a pre-defined assessment of business impact by class of incident. Where appropriate (ie, for medium- and high-classified incidents), the incident is raised to both the relevant business unit and group managers and, in particularly critical instances, the CEO. Regular reporting of significant incidents to the Clearing and Settlement Boards and the Audit and Risk Committee also takes place.
- (c) Systems have sufficient capacity to process the expected volumes of transactions with the required speed, including at peak times and on peak days.
Capacity for critical systems is monitored on an ongoing basis, with monthly reviews of current and projected capacity requirements. The results are reviewed against established guidance for capacity headroom over peak recorded values for all critical systems; that is, to maintain 50 per cent over peak recorded daily volumes, with the ability to increase to 100 per cent over peak within six months. Capacity data are reported monthly to the CEO. While there is no known limitation to scalability for any ASX key system, any infrastructure upscaling beyond verified target levels is preceded by appropriate analysis and testing. Capacity utilisation by system is shown in Table 1.
In addition to technical capacity, ASX policy also requires that it has sufficient human resource capacity to operate the clearing and settlement systems during peak periods, including in the event of operational incidents or system failure.
System monitoring is in place to identify and escalate issues, including potential performance issues. Regular management review of system performance is also undertaken, with monthly updates to the CEO and quarterly reporting to the Audit and Risk Committee.
- (d) Changes to technical systems and supporting infrastructure do not disrupt its usual operations.
This measure requires that all procedures relating to change management be thoroughly documented, and that procedures include notification to participants where significant changes occur. It also requires that all changes be thoroughly tested outside a production environment.
ASX operates separate test environments for each system and has a formal, documented change-management process. This includes procedures for emergency changes, whereby all system changes must be documented and formally signed off by stakeholders. All changes are reviewed on a weekly basis. External stakeholders may be consulted depending on the nature of the proposed change.
However, during the year, EXIGO experienced a significant operational outage that was caused by a change to code within the live production database that was intended for the test system (see Section 5.4). Steps have since been taken to physically separate the test and production systems and tighten access procedures for the production system. The Reserve Bank is satisfied that Austraclear is taking appropriate steps to address the specific issues raised by this incident and will continue to monitor the implementation of the new arrangements.
- (e) The system has well-trained and competent personnel to ensure that all key systems are operated securely and reliably.
Staff are provided with relevant policies and guidelines from commencement of employment, with weekly communications thereafter. For particularly critical updates, policies are distributed to staff via e-mail with a required response from staff indicating that they have read, understood and agree to all aspects of the policy.
Clearing and settlement operational staff are evaluated with reference to each defined operational process. A rating scale is applied to each staff member in respect of a defined process. The rating, determined by the relevant team leader, establishes a staff member's ability in respect of particular processes, including exception processing and troubleshooting. This performance measure feeds through to future training and development needs. On-the-job training within a review/coaching process is provided for new staff. Thereafter, ASX maintains a process list and rotates staff to ensure that analysts are carrying out each task at a minimum every two months.
ASX has a formal succession-planning and management process in place. This aims to ensure leadership continuity in key positions, develop intellectual and knowledge capital, and encourage individual development. Succession and contingency planning is conducted for Group Executives, General Managers and key/critical staff. Related to this, a ‘Key Person Framework’ is reviewed by Group Executives and Human Resource officers on a quarterly basis. This tracks ASX critical employees in terms of career- and leadership-development opportunities. The framework enables the Human Resources unit to identify cross-training needs so as to minimise critical knowledge being held by a single individual.
ASX staff retention is within the range for financial services businesses more generally.
(ii) Business continuity procedures
This aspect of the measure requires that the system operator has in place arrangements to ensure the timely recovery of its operations in the event of a disruption, ie, the failure of one or more components of the system.
- (a) The operator should have detailed contingency plans, including backup arrangements for its critical communications and computer systems and key personnel.
ASX maintains extensive contingency plans detailing the appropriate operational response to a CS facility disruption, including coverage of the various lines of authority, means of communication, and failover procedures. ASX is in the process of revising its business continuity policy. External auditors recommended that this policy review be finalised and that it include periodic risk assessment. This process is expected to be complete by the end of the fourth quarter of 2009.
The risk that an operational incident at ASX's main site disrupts ASX functionality is mitigated through maintenance of a backup site. The ASX backup site is remote from the Sydney CBD and is supported by separate power, water, and telecommunications infrastructure. While there is full redundancy for all core systems at the primary site, this is currently true only for EXIGO at the backup site. The case for introducing dual architecture to ensure redundancy for all four systems is currently being examined in the context of ASX's ongoing review of business continuity policy.
ASX has procedures in place to manage the availability of specific staff skill sets in the event of a contingency. Migration to the backup site is targeted to occur within one to two hours, with clearing and settlement systems operable from the backup site for at least 30 days. The backup systems include real-time data mirroring, designed to ensure no data loss in the event of a contingency.
Best practice continues to evolve in the area of business continuity and ASX is encouraged to keep arrangements under review. One possible enhancement being considered is maintenance of a permanent operational staff presence at the backup site. Staff from ASX's data centre are currently permanently located at the backup site, but ASX is considering the case for also maintaining a core staff presence for other key operational functions. This would facilitate rapid recovery in the event of a disruption, and staff familiarity with the site.
ASX is also in the process of finalising an updated Pandemic Response Plan covering detailed business-unit plans and considerations. The Reserve Bank encourages ASX to complete this and in this context welcomes ASX's decision to review its remote-access capabilities. Currently, remote access capability covers 69 per cent of clearing risk operations staff, 29 per cent of clearing and settlement operations staff, and 90 per cent of information technology staff. ASX is conducting a feasibility assessment of further expansion of this capacity and the Reserve Bank will monitor progress on this matter.
In the extreme case that one or more participants were unable to access either the primary or backup sites, established procedures allow for the relevant CS facility to act as agent in communicating with the core operational systems.
- (b) The operator should require its participants to have appropriate complementary arrangements in the event of a contingency.
The Operating Rules for each of the CS facilities require participants to maintain adequate business continuity arrangements to allow the recovery of usual operations within approximately one to two hours following a contingency event (matching ASX's own timetable for shifting operations to the backup site). Failure to comply with the rules may result in the application of a variety of sanctions, including immediate restrictions to functionality, or referral to ASXMS, which may lead to further disciplinary action. ASX systems are designed to prevent disruption to clearing and settlement activities associated with the operational failure of any individual participant. Participants are also involved in business continuity tests (see (ii)(c)).
- (c) The operator should undertake regular industry testing of its business-recovery arrangements.
Business-recovery arrangements are tested on a regular basis.
Representatives of ASX CS facilities attend the backup site on a monthly basis to perform connectivity and procedural testing. Live tests (ie, where market and clearing and settlement services are provided in real time from the backup site), are conducted on a two-year cycle for each system (full rehearsals are undertaken prior to such testing to minimise possible associated risks). In these tests, participants connect to systems at the ASX backup site from their primary sites via ASX primary site communications infrastructure, as do any interdependent systems.
Test results are formally documented and reported to ASX senior management and are also made available to internal and external auditors (internal audits may be driven by any material change to ASX's business continuity policy or related risk profile). Any issues arising from test results are recorded and tracked to resolution.
Recent external audit findings suggest some scope to enhance business continuity tests. Currently, the plans include testing whether systems fail over as planned. However, as this is achieved by way of a well-planned switchover of systems before start of business, rather than a simulated fail during the course of the day, a risk remains that failover may not occur as intended, leading to delay and potential loss of data. The external auditor has therefore recommended that ASX consider testing to validate failover capability for infrastructure and applications so as to confirm no data lag or loss for key systems. ASX has undertaken to review various failover test approaches and the Reserve Bank intends to follow up with ASX on this matter.
- (d) Conduct regular reviews of the adequacy of these arrangements and make such changes as are necessary and desirable.
The adequacy of ASX's business continuity procedures is reviewed regularly, as part of broader reviews of ASX's operational risk policy.
(iii) Outsourcing
This aspect of the measure requires that security, operational reliability and business continuity procedures extend to systems and processes that have been outsourced. The CS facility licensee as operator must ensure that service providers meet the same standards as apply to the operator with respect to the function outsourced. Furthermore, even when systems and processes are outsourced, the operator remains responsible for those systems and processes.
No operational functions are outsourced by any ASX CS facility. However, external suppliers are used for various services, such as utilities, hardware maintenance, operating system and product maintenance, and certain security-related specialist independent services.
In addition, both SFECC and Austraclear rely on NASDAQ OMX to provide third level support and development for software products. In the event that NASDAQ OMX should fail, ASX has established escrow arrangements to allow the relevant source codes to be accessed, and hence such support to be provided internally (the same arrangements require NASDAQ OMX to provide assistance and training to allow such a transition). Similar arrangements would apply should NASDAQ OMX withdraw its service.
ACH also currently relies on an external vendor for the software underpinning margining for DCS (that is, TIMS software, discussed in Section 5.1). Plans are underway to remove this reliance, and to integrate the TIMS margin calculations within DCS.
Dependencies on other system operators are also relevant. Both ASTC and Austraclear are reliant on interactions with SWIFT, and would revert to manual processing of SWIFT payments in the event of a SWIFT failure. The failure of RITS would potentially prevent settlement in EXIGO, although ASX has prepared business plans to consider the potential for EXIGO to continue operating independently.
(iv) External administration of a related body
This aspect of the measure requires that the CS facility licensee as operator ensure that it would have access to the necessary human, technical and other resources needed to continue operating in circumstances where a related body became subject to external administration.
Within the ASX group structure, most operational resources are provided by ASX Operations Limited, a subsidiary of ASX Limited. In the event that ASX Operations Limited became subject to external administration and this particular event did not impact upon the capacity of ASX clearing and settlement corporate entities to continue operating, those entities would be able to retain use of resources under provisions within the written support agreement between each licensed operator and ASX Operations Limited (to the extent permissible by law).
Summary
Over time, ASX has developed detailed policies and procedures to ensure the operational robustness of the key systems supporting the four CS facilities. It is the Reserve Bank's assessment that ASX's arrangements are consistent with the operational risk measure of the Financial Stability Standards.
Nevertheless, the Reserve Bank notes that best practice in respect of operational risk continues to evolve and the licensed CS facilities should respond both to this evolution and to specific issues identified by unfolding events. ASX's review of business continuity policy is welcome in this regard, including review of the case for introducing full redundancy for all four key systems at the business-recovery site and potential extension of remote-working arrangements. Another possible enhancement being explored in this context is to permanently locate some operational staff at the site, so as to facilitate rapid recovery in the event of a disruption, and staff familiarity with the site.
The Reserve Bank will also continue to monitor implementation of enhancements to operational risk-management processes recommended by internal and external auditors. These include: completion of business-unit level pandemic planning; ongoing enhancement/update of detailed business-resumption plans; and an assessment of whether to include ‘failover testing’ within regular business continuity tests. Finally, the Reserve Bank will also monitor the implementation of process enhancements specifically related to the EXIGO outage in March 2009.
Footnote
‘Failover’ refers to the capacity to switch over to a standby system in the event of an operational disruption. [1]