Databases Reference
In-Depth Information
We want our κ statistic to tell us whether actual agreement is above that baseline level. If not, κ is
zero. At the other extreme, perfect agreement would yield a κ statistic of one.
As we will see, it is not uncommon to have very low κ scores for a given corpus, particularly
with the summarization task. The fact that the κ scores for summarization annotation are well
below one does not mean they are useless, but merely that there is no such thing as a “single best
summary.” And for that reason, it is important to recruit as many annotators as possible, and as many
annotations per document as possible, when doing summarization coding.
2.1.1 MEETING CORPORA
Meetings represent one of the conversational domains that has received the most attention from
the mining and summarization communities. Research on meetings has been greatly facilitated in
recent years by the availability of large, freely available annotated corpora. We discuss two meeting
corpora in particular, the AMI corpus and the ICSI corpus, and describe the manner in which they
were annotated for summarization purposes and for a variety of other mining tasks.
AMI Corpus The AMI meeting corpus [ Carletta , 2006 ] was created as part of the European
Union-funded AMI project 1 . The corpus consists of
100 hours of recorded, transcribed and an-
notated meetings, divided into scenario and non-scenario meetings. In the scenario meetings, four
participants take part in each meeting and play roles within a fictional company. The scenario given
to them is that they are part of a company called Real Reactions, which designs remote controls.
Their assignment is to design and market a new remote control, and the members play the roles
of project manager (the meeting leader), industrial designer, user-interface designer, and marketing
expert. Through a series of four meetings, the team must bring the product from inception to market.
The first meeting of each series is the kick-off meeting, where participants introduce them-
selves and become acquainted with the task. The second meeting is the functional design meeting,
in which the team discusses the user requirements and determines the functionality and working
design of the remote. The third meeting is the conceptual design of the remote, wherein the team
determines the conceptual specification, the user interface, and the materials to be used. In the fourth
and final meeting, the team determines the detailed design and evaluate their result.
The participants are given real-time information from the company during the meetings,
such as information about user preferences and design studies, as well as updates about the time
remaining in each meeting. While the scenario given to them is artificial, the speech and the actions
are completely spontaneous and natural. There are 138 meetings of this type in total. The length of
an individual meeting ranges from
15-45 minutes, depending on which meeting in the series it is
and how quickly the group is working.
The non-scenario meetings are naturally occurring meetings that would have been held re-
gardless of the AMI data collection, and so the meetings feature a variety of topics discussed and a
variable number of participants.
1 http://www.amiproject.org/
Search WWH ::




Custom Search