Databases Reference
In-Depth Information
E 4
d
e
> c
> > b
> > > a
E 5
g
h
> > d
> f
> > e
E 6
> g
i
> h
j
E 1
a
E 2
b
> a
E 3
c
> b
> > a
(a)
e
g
i
a
b
c
f
(b)
d
h
j
Figure 3.14: (a) An email conversation from the Enron corpus containing six emails. Arrows between
emails represent the reply-to relation. (b) The corresponding Fragment Quotation Graph, in which nodes
are created by identifying quotations and edges are created between neighboring quotations.
￿ Creating Nodes: Initially, all new and quoted fragments are identified. For instance, email E 2
is split into two fragments: the new fragment b and the quoted fragment a . E 3 is decomposed
into 3 fragments: the new fragment c and two quoted fragments b and a . E 3 is decomposed
into de , c , b and a , and so on and so forth. After that, to identify distinct fragments (nodes),
fragments are compared with each other and overlaps are identified. Fragments are split if
necessary (e.g., fragment gh in E 5 is split into g and h when matched with E 6 ), and duplicates
are removed. At the end, 10 distinct fragments a, ..., j give rise to 10 nodes in the graph shown
in Figure 3.14 (b). 9
￿ Creating Edges: Edges are created to represent likely replying relationship among fragments.
The assumption is that any new fragment in a message is a potential reply to neighboring
quotations, i.e., quoted fragments immediately preceding or following it. For instance, consider
E 6 in Figure 3.14 (a), there are two edges from node i to g and h , because i is between g and
h ; while there is only a single edge from j to h , because j is under h , but there is no text under
j .
Figure 3.14 (b) shows the complete fragment quotation graph of the conversation shown in
Figure 3.14 (a). Notice how the threading of the conversation in the FQG is done at the finer level
9 In this email thread, fragment f reflects a special and important phenomenon, where the original email of a quotation does not
exist in the thread. Carenini et al. characterize this as the hidden email problem and its influence on email summarization is
discussed in Carenini et al. [ 2007 ].
Search WWH ::




Custom Search