RDP 2021-05: Central Bank Communication: One Size Does Not Fit All Appendix B: Text Features

Table B1: Key Features Extracted from Sample Paragraphs
Category Feature name Description
Textual(a) Paragraph length The count of words in a paragraph
  Sentence count The count of sentences in a paragraph
  Number count The count of numbers in a paragraph
  Comma count The count of commas in a paragraph
  Other punctuation count The count of any other punctuations except commas
  First sentence with numbers A Boolean value indicating the first sentence contains numbers
  First sentence with ‘Table’ or ‘Figure/Graph’ A Boolean value indicating the first sentence refers to tables, figures or graphs
Readability Syllables count The count of syllables
  Average word length The average syllables of a word
  Count of complicated words The count of words that have three and more syllables
  FK grade level Flesch–Kincaid grade level
Syntactic PoS count The count of tokens marked with a certain part-of-speech tag in a paragraph
  PoS ratio The percentage of tokens marked with a certain part-of-speech tag in a paragraph
  PoS count in the first sentence The count of tokens marked with a certain part-of-speech tag in the first sentence of a paragraph
  PoS ratio in the first sentence The percentage of tokens marked with a certain part-of-speech tag in the first sentence of a paragraph
  PoS count in the last sentence The count of tokens marked with a certain part-of-speech tag in the last sentence of a paragraph
  PoS ratio in the last sentence The percentage of tokens marked with a certain part-of-speech tag in the last sentence of a paragraph
  PoS for the first word in the first sentence The type of PoS tag for the first word in the first sentence of a paragraph
  PoS for the first word in the second sentence The type of PoS tag for the first word in the second sentence of a paragraph
  PoS for the first word in the third sentence The type of PoS tag for the first word in the third sentence of a paragraph
  Parse tree types count for a paragraph The count of parse tree types for each sentence in a paragraph
  Parse tree types count for the first sentence of a paragraph The count of parse tree types for the first sentence of a paragraph
  Parse tree types count for the last sentence of a paragraph The count of parse tree types for the last sentence of a paragraph
Argument features Count of each type of clue words Count of clue words by each type (summarise, informative, etc)
  Count of clue words in the first sentence Count of clue words by each type in the first sentence of a paragraph
  Count of clue words in the last sentence of a paragraph Count of clue words by each type in the last sentence of a paragraph
Note:
  1. We deliberately exclude n-gram words in the feature list as our survey only includes economists working in the RBA, who have sufficient knowledge for all economic terms
Table B2: Alphabetical List of the Penn Treebank Part-of-Speech Tag Set
Number Tag Description
1 CC Coordinating conjunction
2 CD Cardinal number
3 DT Determiner
4 EX Existential there
5 FW Foreign word
6 IN Preposition or subordinating conjunction
7 JJ Adjective
8 JJR Adjective, comparative
9 JJS Adjective, superlative
10 LS List item marker
11 MD Modal
12 NN Noun, singular or mass
13 NNS Noun, plural
14 NNP Proper noun, singular
15 NNPS Proper noun, plural
16 PDT Predeterminer
17 POS Possessive ending
18 PRP Personal pronoun
19 PRP$ Possessive pronoun
20 RB Adverb
21 RBR Adverb, comparative
22 RBS Adverb, superlative
23 RP Particle
24 SYM Symbol
25 TO to
26 UH Interjection
27 VB Verb, base form
28 VBD Verb, past tense
29 VBG Verb, gerund or present participle
30 VBN Verb, past participle
31 VBP Verb, non-3rd person singular present
32 VBZ Verb, 3rd person singular present
33 WDT Wh-determiner
34 WP Wh-pronoun
35 WP$ Possessive wh-pronoun
36 WRB Wh-adverb

Note: This table is adapted from Santorini (1990, p 6)

Table B3: Alphabetical List of the Penn Treebank Parse Tree Tag Set
Number Tag Description
1 ADJP Adjective phrase
2 ADVP Adverb phrase
3 NP Noun phrase
4 PP Prepositional phrase
5 S Simple declarative clause
6 SBAR Subordinate clause
7 SBARQ Direct question introduced by wh-element
8 SINV Declarative sentence with subject-aux inversion
9 SQ Yes/no questions and subconstituent of SBARQ excluding wh-element
10 VP Verb phrase
11 WHADVP Wh-adverb phrase
12 WHNP Wh-noun phrase
13 WHPP Wh-prepositional phrase
Note: This table is adapted from Table 1.2 in Taylor, Marcus and Santorini (2003, p 9)

Table B4 shows how these features would be expressed for a simple sentence ‘The cat sat on the mat because it was warm.’.

Table B4: A Short List of Text Features for a Sample Sentence
  Value
Text features
Count of words 10
Count of sentences 1
Count of syllables 11
Count of polysyllables (words with 3+ syllables) 0
Syllables per word 1.1
FK grade level 1.29
Count of clue words(a) 1 (‘because’)
Syntactic features
PoS tags feature DT = 2, NN = 2, VBD = 2, IN = 2, DT = 1, NN = 1, PRP = 1, JJ = 1
Syntactic parse features S = 2, NP = 3, VP = 2, SBAR = 1, PP = 1, ADJP = 1
Note:
  1. ‘Clue words’ is a list of words or phrases that link individual propositions to form one coherent presentation; please refer to Cohen (1984) for a full list