RDP 2021-05: Central Bank Communication: One Size Does Not Fit All Appendix C: Model Tuning Process
May 2021
- Download the Paper 2,051KB
C.1 Feature selection process
In this study we adopt an automatic feature selection method, called recursive feature elimination (RFE) (Guyon et al 2002), to select the relevant features for each model. This helps ensure that each feature included in the final model has a minimum degree of predictive power. Otherwise, the models may mistake ‘noise’ for ‘signal’. This algorithm is configured to explore all possible subsets of the features. The computing process is shown in Table C1.
1.1 | Train the model on training dataset using all features {X1,X2,…,Xn} |
---|---|
1.2 | Calculate model performance |
1.3 | Calculate variable performance |
1.4 | For each subset size of Si,i = 1…n do
|
1.5 | End |
1.6 | Calculate the performance profile over the Si |
1.7 | Determine the appropriate number of predictors |
1.8 | Use the model corresponding to the optimal Si |
Source: https://topepo.github.io/caret/recursive-feature-elimination.html |
Our model includes 292 features in total, so in the first step of the RFE process we include all features. Then, we run the model using 30 different subset feature sizes, that is (10, 20,…, 290, 292). To minimising overfitting due to feature selection, we take the cross-validation resampling method to run the process listed in Table C1 on the testing dataset only and calculate the model performance using the validation dataset. We run this process 10 times and calculate the model performance (accuracy) for each subset of features using the average of the results from those 10 runs.
C.2 Tuning parameters process
To improve model performance, we tune 2 parameters:
- the number of trees that will be built for each model (ntree), and
- the optimal number of variables that will be selected for each node in a tree (mtry).
The default value of ntree is 500, and that of mtry is the root square of number of features. Different values of those 2 parameters may affect model performance. To find the optimal settings, we employ a grid search approach.
For the grid search, we choose 11 different ntree values (10, 100, 200, 300,…,1,000) and, for mtry, as suggested by Breiman (2001), we choose 3 values: the default value (mtry = 17), half of the default (mtry = 9), and twice the default (mtry = 34). For each combination, we build 10 models using 10-fold cross-validation and repeat the process 3 times. The best combination of ntree and mtry is selected based on the combination that returns the highest accuracy.
C.3 Top ten features for four models
Rank | Reasoning model | Readability model | |||
---|---|---|---|---|---|
Features | Importance(a) | Features | Importance(a) | ||
Economist | |||||
1 | Proportion of VB | 6.2 | Proportion of CC | 13.1 | |
2 | Proportion of NNS | 4.6 | Proportion of RB | 9.7 | |
3 | Proportion of MD | 4.5 | Proportion of VB | 7.6 | |
4 | Count of digits | 4.1 | Proportion of VBP | 7.4 | |
5 | Count of VB | 3.9 | Count of NN | 6.9 | |
6 | Proportion of NN | 3.6 | Count of NP | 6.8 | |
7 | Count of MD | 3.5 | Count of punctuation | 5.9 | |
8 | Proportion of IN | 3.5 | Proportion of MD | 5.5 | |
9 | Proportion of CD | 3.5 | Count of commas | 4.6 | |
10 | Proportion of VBN | 2.8 | Count of SBAR | 4.6 | |
Non-economist | |||||
1 | Proportion of VB | 10.6 | Proportion of DT | 5.2 | |
2 | Proportion of MD | 9.0 | Proportion of JJ | 5.1 | |
3 | Proportion of JJ | 7.3 | FK grade level | 4.7 | |
4 | Proportion of IN | 6.1 | Count of NP | 4.7 | |
5 | Proportion of NN | 5.9 | Count of syllables | 4.7 | |
6 | Count of MD | 5.3 | Proportion of NN | 4.4 | |
7 | Proportion of VBN | 5.3 | Proportion of CC | 4.4 | |
8 | Count of VB | 5.2 | Proportion of VB | 4.3 | |
9 | Proportion of TO | 5.1 | Proportion of IN | 4.3 | |
10 | Proportion of CC | 5.1 | Proportion of NNS | 4.2 | |
Note:
|