Database Reference
In-Depth Information
as a table, then generate CRF features from the table. The table for the parse
tree in Figure 10.5 is shown in Figure 10.6 .
10.2.2.2
Cells and attributes
A labeled question comprises the token sequence x i ; i =1 ,... and the label
sequence y i ,i =1 ,.... Each x i leads to a column vector of observations.
Therefore we use matrix notation to write down x : A table cell is addressed
as x [ i, ]where i is the token position (column index) and is the level or row
index, 1-6 in this example. (Although the parse tree can be arbitrarily deep,
we found that using features from up to level =2wasadequate.)
Intuitively, much of the information required for spotting an informer
can be obtained from the part of speech of the tokens and phrase/clause
attachment information. Conversely, specific word information is generally
sparse and potentially misleading; the same word may or may not be an
informer depending on its position, e.g., “What birds eat snakes?” and “What
snakes eat birds?” have the same words but different informers. Accordingly,
we observe two properties at each cell:
tag : The syntactic class assigned to the cell by the parser, e.g., x [4 , 2] .tag =
NP . It is well known that POS and chunk information are major clues to
informer-tagging, specifically, informers are often nouns or noun phrases.
num : Many heuristics exploit the fact that the first NP is known to have a
higher chance of containing informers than subsequent NPs. To capture this
positional information, we define num of a cell at [ i, ] as one plus the number
of distinct contiguous chunks to the left of [ i, ]with tag sequalto x [4 , 2] .tag .
E.g., at level 2 in the table above, the capital city forms the first NP, while
Japan forms the second NP. Therefore x [7 , 2] .num =2.
In conditional models, it is notationally convenient to express features as
functions on ( x i ,y i ). To one unfamiliar with CRFs, it may seem strange that
y i is passed as an argument to features. At training time, y i is indeed known,
and at testing time, the CRF algorithm eciently finds the most probable
sequence of y i s using a Viterbi search. True labels are not revealed to the
CRF at testing time.
Cell features IsTag and IsNum : E.g., the observation “ y 4 =1and
x [4 , 2] .tag = NP ” is captured by the statement that “position 4 fires the
feature IsTag 1 , NP , 2 ” (which has a boolean value). There is an IsTag y,t, feature
for each ( y, t, ) triplet, where y is the state, t is the POS, and is the level.
Similarly, for every possible state y , every possible num value n (up to some
maximum horizon) and every level , we define boolean features IsNum y,n, .
E.g., position 7 fires the feature IsNum 2 , 2 , 2 in the 3-state model, capturing the
statement “ x [7 , 2] .num =2and y 7 = 2”.
Search WWH ::




Custom Search