RDP 2020-03: The Determinants of Mortgage Defaults in Australia – Evidence for the Double-trigger Hypothesis 5. Estimation Strategy
July 2020
- Download the Paper 1,854KB
5.1 A Two-stage Approach
The simplest version of the double-trigger hypothesis states that both an ability-to-pay shock and negative equity are required for a loan to default, and can be represented in Equation (2):
where Fi,t is the binary foreclosure event at time t for loan i, Ai,t is the extent of the ability-to-pay shock, Ni,t is the extent of negative equity, and and are some thresholds.
Equation (2) states that a borrower forecloses on their mortgage only if:
- the borrower experiences a ‘shock’ to their ability to repay their mortgage that exceeds some threshold of their ability or willingness to pay, and
- negative equity on the mortgage exceeds some threshold of negative equity that the borrower is willing to tolerate, given their individual costs of foreclosure.
If either the ability-to-pay shock or the extent of negative equity do not exceed these thresholds, the double-trigger hypothesis predicts that the borrower will not foreclose.
Note that the borrower would be willing to enter foreclosure as soon as the conditions of Equation (2a) are met. However, the timing of foreclosure is also determined by the lender, which faces incomplete information regarding the situation and preferences of the borrower. Where a lender extends some leniency towards the borrower, foreclosure will not occur immediately upon Equation (2a) being satisfied. Where this is the case, the borrower may not proceed to foreclosure at all if the ability-to-pay shock or negative equity are subsequently reversed. Hence, the probability of foreclosure in Equation (2a) is greater than 0, rather than equal to 1.
The probability of foreclosure can be further decomposed into the probability that a loan forecloses given that it has been in 90+ day arrears, R90+i,t−1 = 1, plus the probability that it proceeds straight to foreclosure:
As noted above, in Australia most loans proceed to foreclosure only after a notice of default has been served, which occurs after the loan enters 90+ day arrears. Therefore, and Equation (3) collapses to:
It is sufficient to estimate Equation (4) to examine the determinants of foreclosures. Notice that Equation (4) can be estimated separately in two stages: the probability that a loan enters 90+ day arrears and the probability that a loan forecloses, conditional on having been in 90+ day arrears.
This two-stage framework is well suited to testing the double-trigger hypothesis, which is naturally described in two stages:
- A ‘shock’ to the borrower's ability to repay the mortgage causes the borrower to miss repayments and enter arrears.
-
The loan's transition from arrears depends on:
- the ability-to-pay shock – if the ability-to-pay shock is subsequently reversed, the borrower may become current on payments and cure
-
the loan's equity position:
- if the loan has positive equity, the borrower can profitably sell their property to avoid foreclosure
- if the loan has negative equity, but to a lesser extent than the cost of foreclosure, the borrower may minimise losses by selling the property themselves to avoid foreclosure
- if the loan has negative equity in excess of the cost of foreclosure, the loan may go on to foreclosure.
A two-stage modelling approach allows the full set of predictions from the double-trigger hypothesis to be tested – including an analysis of the loans that cure and repay, rather than just those that foreclose. This acknowledges that the transition from arrears to foreclosure is not automatic; rather most loans in arrears do not go on to foreclosure and the actions of both borrowers and lenders influence this transition.
Testable sub-hypotheses that arise from the above are:
Hypotheses A and B relate to the first stage. Hypothesis A states that the probability of a loan entering 90+ day arrears is increasing in the size of the ability-to-pay shock and is close to 0 where the size of the shock does not exceed the borrowers' ability-to-pay threshold. Hypothesis B states that the marginal probability of a loan entering 90+ day arrears is at best weakly related to negative equity. Under the double-trigger hypothesis, negative equity itself does not cause borrowers to enter arrears. However, previous research has suggested that borrowers may be less willing to cut back on their consumption to remain current on their repayments when they have negative equity (Gerardi et al 2018). If this is the case, then threshold may be a function of Ni,t and the derivative in Hypothesis B may be positive.
Hypotheses C and D relate to the second stage. Hypothesis C states that the probability of foreclosure is increasing in the extent of negative equity, given that the loan has been in arrears, but is close to 0 where the extent of negative equity is less than the cost of foreclosure. Hypothesis D states that once a loan has arrears of 90+ days, the size of the ability-to-pay shock has no influence on the probability of foreclosure (unless the shock is subsequently reversed).
5.2 Cox Proportional Hazard Models
I test the hypotheses outlined above using a two-stage Cox proportional hazard model framework with competing risks. Following the framework set out above, the first stage examines entries to 90+ day arrears, while the second stage estimates transitions to foreclosure, curing and full repayment.
Cox proportional hazard models are most commonly used in the biomedical literature, but have also been used to estimate the effect of covariates on the probability of loans entering arrears (e.g. Deng et al 1996; Gerardi et al 2008). They estimate the effect of a change in a vector of variables on the instantaneous probability (or hazard) that an event of interest is observed, given that event has not yet been observed (Cox 1972).
The Cox proportional hazard model is useful when the probability of an event changes over some time dimension (such as time since loan origination), loans are observed at different points along this time dimension, and those loans that have not yet experienced the event could still do so in the future (known as right censoring). The key virtue of the Cox model is that this time dimension is part of the inherent structure of the model, as opposed to binary or multinomial choice models that include the time dimension as an additional component with a specific functional form. With this time-based structure, the Cox model is not biased by not having information about the future; all that is necessary is knowledge of whether the event had occurred by the point at which the loan was observed.
One downside of the Cox model is that outcomes that prevent the event of interest from occurring (known as competing risks) are treated as if the loans were right censored. For example, a loan that is repaid early is treated as if it could still go into arrears in the future. This is problematic if the factors that cause loans to be repaid are related to the factors that cause arrears (i.e. the events are not independent). While models exist that incorporate the time dimension in a similarly flexible way to the Cox model but do not treat competing risks as independent, these models can be difficult to interpret and are not commonly used in the empirical mortgage default literature.[8] So I use the Cox model.[9]
The Cox model takes the form specified in Equation (5), where is the baseline hazard (instantaneous probability) of event E occurring at time t, x is a vector of explanatory variables and is a vector of coefficients. The model flexibly accounts for the effect of time on the hazard of experiencing the event of interest by only specifying results relative to a baseline probability (the baseline hazard rate). By assuming the covariates affect the hazard rate multiplicatively, the baseline hazard rate need not be specified in order to estimate how the covariates change the probability of the event of interest.
The results reported in Section 6 are the ‘hazard ratios’ from the estimated models (these ratios are used to test the hypotheses derived in Section 5.1). Hazard ratios, similar to odds ratios, can be interpreted as a one unit increase in variable k leading to a per cent increase in the probability of event E above the baseline hazard at time t. For example, a hazard ratio of 1.7 would represent a 70 per cent increase in the instantaneous probability of an outcome. The virtue of reporting results in this way is that hazard ratios do not depend on t or the value of xit. Note that the exp function imposes a multiplicative relationship between the x variables.
5.3 Model Specification – Further Details
5.3.1 Model details – dependent variables, competing risks and sample construction
In the first-stage model, the event of interest is a loan entering 90+ day arrears, the competing risk is a loan being fully repaid, and the time dimension is seasoning (i.e. the time since origination). Loans which were ‘performing’ (i.e. not in arrears), were less than 90 days in arrears as at June 2019, or that were removed from the dataset for some other reason, are also treated as right censored.[10] To avoid problems with left censored data (i.e. loans experiencing an event prior to entering the dataset), I exclude loans originated prior to 2013.[11] This results in a sample of 1.7 million loans. To allow for the inclusion of time-varying covariates that may be correlated with seasoning, such as indexed LVRs and changes to required loan repayments, the model is estimated using quarterly observations.[12]
The second stage is estimated on loans that have entered 90+ day arrears. In the second stage, there are three possible events (foreclosure, curing or full repayment). For most of my results, and when testing the hypotheses, foreclosure is the event of interest.[13] The time dimension in this stage is the time since entering 90+ day arrears. Loans which remained in arrears as at June 2019, had a competing event occur, or that were removed from the sample for other reasons while still in arrears, are treated as right censored. I exclude loans that were in arrears at the beginning of the sample, as the length of their time in arrears is unknown. Time-varying explanatory variables, such as LVRs, are included as at the time the loan entered arrears (so they are not correlated with the time dimension in the second stage).
The two stages are estimated independently. As shown in Equation (4), independent estimation is sufficient to examine the double-trigger hypothesis and the determinants of foreclosure; incorporating the first stage results into the second stage using a Heckman selection procedure is not necessary. That said, this set-up means that the second stage results alone cannot be used to make statements about the unconditional probability of foreclosure.
Relatedly, all of my results are relative to a baseline hazard. This means that a hazard ratio of 1.7 for a particular variable, for example, only tells you that the hazard is 70 per cent higher with the increase in that variable; it provides no information about the probability of the event occurring. Where the baseline hazard is close to 0, large hazard ratios are required for the overall probability to move meaningfully away from 0.
5.3.2 Key explanatory variables
The key ability-to-pay explanatory variable is the regional unemployment rate, adjusted for internal migration. This is used as a proxy for the probability that an individual borrower faces an ability-to-pay shock.[14] As with many other empirical studies, actual individual shocks cannot be observed in the data. This means that the true effect of becoming unemployed (or facing another individual shock) will be underestimated by the models, possibly by a very large degree. Notwithstanding this, the estimated hazard ratio for the unemployment rate is expected to be particularly large in the first-stage model, as unemployment represents a large ability-to-pay shock. While the unemployment rate is expected to be of secondary importance in the second stage, as it is not expected to affect foreclosure (conditional on being in arrears), it may still be relevant as regaining employment may allow a borrower to cure (a competing risk).
Two variables may be related to a borrower's ability-to-pay threshold. The first of these is the debt serviceability ratio (DSR); in the event of a reduction in income, a borrower with low relative servicing costs may be able to continue to make repayments from their remaining income or to draw on savings for a longer period to make repayments.[15] The second is mortgage repayment buffers; a borrower with sizeable accumulated excess repayments may be able to draw down on these repayments for a number of months before the loan enters arrears.[16] As such, a low serviceability ratio and high repayment buffers may enhance a borrower's resilience to shocks.
Equity is measured by indexed scheduled LVR, which is specified as buckets in the model. Each bucket is treated as a separate variable; for example, a loan with an LVR of 76 would have a value of one in the 70–80 LVR bucket and a value of zero in all other LVR buckets. The use of buckets is standard within the literature as it is flexible and can highlight any potential nonlinearities or threshold effects. The double-trigger hypothesis predicts that foreclosure occurs for loans in arrears when . But individual borrowers' foreclosure cost thresholds are not observable; this implies that the estimated hazard ratio for negative equity may be increasing nonlinearly, as it becomes increasingly likely that a higher Ni,t exceeds for more borrowers.
One potential criticism of models that include a number of regional variables is that the variables may be correlated, making the identification of individual effects difficult. Of particular concern may be the potential correlation between regional unemployment rates and housing prices, which are incorporated in the indexed LVR estimates. Very large sample sizes (approximately 12 million observations in the first stage and 40 thousand in the second stage), and the estimation of indexed LVRs at the individual loan level, help alleviate this concern. In addition, state and time fixed effects have been added to the models and standard errors are clustered at the SA3 region level.
Various loan-level controls are also included, such as borrower and loan characteristics. Variable definitions can be found in Appendix A.
Footnotes
The difficulty in interpretation stems from variables which are positively correlated with the competing risk appearing to have a preventative effect against the event of interest – since the individual is less likely to be in the risk set – even when those variables are in fact uncorrelated with the event of interest directly. See Fine and Gray (1999) for an implementation. [8]
To check the robustness of my results, I estimate a multinomial logit model that does not treat the competing risks as independent. See Appendix C for results. [9]
Loans may also be removed from the dataset when a marketed RMBS deal is called, or when collateral is substituted out of a self-securitisation. [10]
The dataset begins in 2015; estimates suggest that relatively few loans are refinanced within the first two years since origination, and very few loans enter arrears in the first two years. Loans originated in 2013 and 2014 coincided with the housing price peak in many mining-exposed regions and provide useful variation in equity that is needed for this analysis. [11]
See Cox (1972) for a discussion of why multiple observations must be used when the variable may be correlated with the time dimension. [12]
To investigate the determinants of the competing risks, I also estimate a separate model for each event. [13]
The region reported in the data is typically that of the property, rather than the borrower. These will be equivalent where the borrower is an owner-occupier, but may differ for investors.
Specifications using the change in the regional unemployment rate, rather than the level, were also tested. However, these data did not adjust for internal migration and the variable was found to have smaller effects in the models.
[14]Serviceability ratios are calculated as scheduled monthly loan repayments as a share of indexed income (income at origination, indexed by state average weekly earnings). [15]
Buffers are calculated as the number of months of scheduled repayments that the borrower has accumulated as excess repayments. As borrowers draw down on these buffers until they enter arrears, the maximum buffer up until 12 months prior to the estimation period is used to avoid bias in the estimated ‘protective’ effect of this variable. [16]