Information Technology Reference
In-Depth Information
analysis of Usenet newsgroup discussions (cf., Smith 1997; Best 1998; Donath
et. al. 1999).
The analysis engine of the Conversation Map system performs the follow-
ing steps on an archive of Usenet newsgroup messages in order to compute the
four outputs described above:
1. Messages are threaded.
2. Quotations are identified and their sources (in other messages) are found.
3. A table of posters (i.e., newsgroup participants) to messages is built.
4. For every poster, the set of all other posters who replied to the poster
is recorded. Posters who reciprocally reply to one another's messages are
linked together in the social network.
5. The “signatures” of posters are identified and distinguished from the rest
of the contents of each message.
6. The words in the messages are divided into sentences. The tool described
in (Reynar & Ratnaparkhi 1997) is used.
7. Discourse markers (e.g., connecting words like “if ”, “therefore”, “conse-
quently”, etc.) are tagged in the messages. We use a list of discourse markers
compiled by (Marcu 1997).
8. Every word of every message is tagged according to its part-of-speech (e.g.,
“noun”, “verb” “adjective”, etc.) A simple trigram-based tagger is used to
accomplish the part-of-speech tagging.
9. Every word is morphologically analyzed and its root is recorded. The
database containing morphological and syntactic information comes from
the University of Pennsylvania (Karp et al. 1992).
10. The words of the messages are parsed into sentences using a partial parser.
The Conversation Map incorporates a re-implementation and revision of
the parser described in (Grefenstette 1994).
11. An analysis of lexical cohesion is performed on every pair of messages
where a pair consists of one message of a thread followed by a message
that follows the message in the thread by either referencing it or quoting
a passage from it. The lexical cohesion analysis procedure we have devel-
oped is akin to, but different than, the one described in (Hirst & St-Onge
1998). This analysis produces an approximation of the themes of discus-
sion. The themes of the discussion label the arcs of the calculated social
network. This allows one to see, for any given pair of posters, the theme of
the posters' discussion.
12. The lexical and syntactic context of every noun in the archive is compared
to the lexical and syntactic context of every other noun in the archive. An
Search WWH ::




Custom Search