Table B1: Key Features Extracted from Sample Paragraphs
Category
Feature name
Description
Textual(a)
Paragraph length
The count of words in a paragraph
Sentence count
The count of sentences in a paragraph
Number count
The count of numbers in a paragraph
Comma count
The count of commas in a paragraph
Other punctuation count
The count of any other punctuations except commas
First sentence with numbers
A Boolean value indicating the first sentence contains numbers
First sentence with ‘Table’ or ‘Figure/Graph’
A Boolean value indicating the first sentence refers to tables, figures or graphs
Readability
Syllables count
The count of syllables
Average word length
The average syllables of a word
Count of complicated words
The count of words that have three and more syllables
FK grade level
Flesch–Kincaid grade level
Syntactic
PoS count
The count of tokens marked with a certain part-of-speech tag in a paragraph
PoS ratio
The percentage of tokens marked with a certain part-of-speech tag in a paragraph
PoS count in the first sentence
The count of tokens marked with a certain part-of-speech tag in the first sentence of a paragraph
PoS ratio in the first sentence
The percentage of tokens marked with a certain part-of-speech tag in the first sentence of a paragraph
PoS count in the last sentence
The count of tokens marked with a certain part-of-speech tag in the last sentence of a paragraph
PoS ratio in the last sentence
The percentage of tokens marked with a certain part-of-speech tag in the last sentence of a paragraph
PoS for the first word in the first sentence
The type of PoS tag for the first word in the first sentence of a paragraph
PoS for the first word in the second sentence
The type of PoS tag for the first word in the second sentence of a paragraph
PoS for the first word in the third sentence
The type of PoS tag for the first word in the third sentence of a paragraph
Parse tree types count for a paragraph
The count of parse tree types for each sentence in a paragraph
Parse tree types count for the first sentence of a paragraph
The count of parse tree types for the first sentence of a paragraph
Parse tree types count for the last sentence of a paragraph
The count of parse tree types for the last sentence of a paragraph
Argument features
Count of each type of clue words
Count of clue words by each type (summarise, informative, etc)
Count of clue words in the first sentence
Count of clue words by each type in the first sentence of a paragraph
Count of clue words in the last sentence of a paragraph
Count of clue words by each type in the last sentence of a paragraph
Note:
We deliberately exclude n-gram words in the feature list as our survey only includes economists working in the RBA, who have sufficient knowledge for all economic terms
Table B2: Alphabetical List of the Penn Treebank Part-of-Speech Tag Set
Number
Tag
Description
1
CC
Coordinating conjunction
2
CD
Cardinal number
3
DT
Determiner
4
EX
Existential there
5
FW
Foreign word
6
IN
Preposition or subordinating conjunction
7
JJ
Adjective
8
JJR
Adjective, comparative
9
JJS
Adjective, superlative
10
LS
List item marker
11
MD
Modal
12
NN
Noun, singular or mass
13
NNS
Noun, plural
14
NNP
Proper noun, singular
15
NNPS
Proper noun, plural
16
PDT
Predeterminer
17
POS
Possessive ending
18
PRP
Personal pronoun
19
PRP$
Possessive pronoun
20
RB
Adverb
21
RBR
Adverb, comparative
22
RBS
Adverb, superlative
23
RP
Particle
24
SYM
Symbol
25
TO
to
26
UH
Interjection
27
VB
Verb, base form
28
VBD
Verb, past tense
29
VBG
Verb, gerund or present participle
30
VBN
Verb, past participle
31
VBP
Verb, non-3rd person singular present
32
VBZ
Verb, 3rd person singular present
33
WDT
Wh-determiner
34
WP
Wh-pronoun
35
WP$
Possessive wh-pronoun
36
WRB
Wh-adverb
Note: This table is adapted from Santorini (1990, p 6)
Table B3: Alphabetical List of the Penn Treebank Parse Tree Tag Set
Number
Tag
Description
1
ADJP
Adjective phrase
2
ADVP
Adverb phrase
3
NP
Noun phrase
4
PP
Prepositional phrase
5
S
Simple declarative clause
6
SBAR
Subordinate clause
7
SBARQ
Direct question introduced by wh-element
8
SINV
Declarative sentence with subject-aux inversion
9
SQ
Yes/no questions and subconstituent of SBARQ excluding wh-element
10
VP
Verb phrase
11
WHADVP
Wh-adverb phrase
12
WHNP
Wh-noun phrase
13
WHPP
Wh-prepositional phrase
Note: This table is adapted from Table 1.2 in Taylor, Marcus and Santorini (2003, p 9)
Table B4 shows how these features would be expressed for a simple sentence ‘The cat sat on the mat because it was warm.’.
Table B4: A Short List of Text Features for a Sample Sentence
Value
Text features
Count of words
10
Count of sentences
1
Count of syllables
11
Count of polysyllables (words with 3+ syllables)
0
Syllables per word
1.1
FK grade level
1.29
Count of clue words(a)
1 (‘because’)
Syntactic features
PoS tags feature
DT = 2, NN = 2, VBD = 2, IN = 2, DT = 1, NN = 1, PRP = 1, JJ = 1
Syntactic parse features
S = 2, NP = 3, VP = 2, SBAR = 1, PP = 1, ADJP = 1
Note:
‘Clue words’ is a list of words or phrases that link individual propositions to form one coherent presentation; please refer to Cohen (1984) for a full list