"Narrative" Information Problems (Artificial Intelligence)

INTRODUCTION

‘Narrative’ information concerns in general the account of some real-life or fictional story (a ‘narrative’) involving concrete or imaginary ‘personages’. In this article we deal with (multimedia) nonfictional narratives of an economic interest. This means, first, that we are not concerned with all sorts of fictional narratives that have mainly an entertainment value, and represent an imaginary narrator’s account of a story that happened in an imaginary world: a novel is a typical example of fictional narrative. Secondly, our ‘nonfictional narratives’ must have an economic value: they are then typically embodied into corporate memory documents, they concern news stories, normative and legal texts, medical records, intelligence messages, surveillance videos or visitor logs, actuality photos and video fragments for newspapers and magazines, eLearning and multimedia Cultural Heritage material, etc.

Because of the ubiquity of these ‘narrative’, ‘dynamic’ resources, it is particularly important to build up computer-based applications able to represent and to exploit in a general, accurate, and effective way the semantic content – i.e., the key ‘meaning’ – of these resources.

BACKGROUND

‘Narratives’ represent presently a very ‘hot’ domain. From a theoretical point of view, they constitute the object of a full discipline, the ‘narratology’, whose aim can be defined as that of producing an in-depth description of the ‘syntactic/semantic structures’ of the narratives, i.e., the narratologist is in charge of dissecting narratives into their component parts in order to establish their functions, their purposes and the relationships among them. A good introduction to the full domain is (Jahn, 2005).

Even if narratology is particularly concerned with literary analysis (and, therefore, with ‘fictional’ narratives), these last years some of its varieties have acquired a particular importance also from a strict Artificial Intelligence (AI) and Computer Science (CS) point of view. Leaving apart the old dream of generating fictions by computer, see (Mehan, 1977) and, more recently, (Callaway and Lester, 2002), we can mention here two new disciplines, ‘storytelling’ and ‘eChronicles’, that are of interest from both a nonfictional narratives and a AI/CS point of view.

Storytelling – see, e.g., (Soulier, 2006) - concerns in general the study of the different ways of conveying ‘stories ‘and events in words, images and sounds in order to entertain, teach, explain etc. Digital Storytelling deals in particular with the ways of introducing characters and emotions in the interactive entertainment domain, and concerns then videogames, massively multiplayer online games, interactive TV, virtual reality etc., see (Handler Miller, 2004). Digital Storytelling is, therefore, related to another, computer-based variant of narratology called Narrative Intelligence, a sub-domain of AI that explores topics at the intersection of Artificial Intelligence, media studies, and human computer interaction design (narrative interfaces, history databases management systems, artificial agents with narrative structured behaviour, systems for the generation and/or understanding of histories/narratives etc.), see (Mateas and Sengers, 2003).

An eChronicle system can be defined in short as way of recording, organizing and then accessing streams of multimedia events captured by individuals, groups, or organizations making use of video, audio and other sensors. The ‘chronicles’ gathered in this way may concern any sort of ‘narratives’ from meeting minutes to football games, sales activities, ‘lifelogs’ obtained from wearable sensors, etc. The technical challenges concern mainly the ways of aggregating the events into coherent ‘episodes’ making use of domain models as ontologies, and providing then access to this sort of material to the users at the required level of granularity. Note that exploration, and not ‘normal’ querying, is the predominant way of interaction with the chronicle repositories; more details can be found, e.g., in (Guven, Podlaseck and Pingali, 2005), (Westermann and Jain, 2006).

The solution (NKRL) proposed for the ‘intelligent’ management of nonfictional narratives in the companion article – ‘Narrative’ Information, the NKRL Solution – of the present one is considered as a fully-fledged eChronicle technique, see (Zarri, 2006). In NKRL, however, a fundamental aspect concerns the presence of powerful ‘reasoning’ techniques – an aspect that is not taken into consideration sufficiently in depth in eChronicles that are mainly interested in the accumulation of narrative materials more than in the ‘intelligent’ exploitation of their inner relationships.

REPRESENTING THE ‘NONFICTIONAL’ NARRATIVES

All the different sorts of ‘nonfictional narratives’ evoked in the previous Sections concern, practically, the description of spatially and temporally characterised ‘events’ that relate, at some level of abstraction, the behaviour or the state of some real-life ‘actors’ (characters, personages, etc.): these try to attain a specific result, experience particular situations, manipulate some (concrete or abstract) materials, send or receive messages, buy, sell, deliver etc. Note that:

• The term ‘event’ is taken here in its most general meaning, covering also strictly related notions like fact, action, state, situation, episode, activity etc.

• The ‘actors’ or ‘personages’ involved in the events are not necessarily human beings: we can have narratives concerning, e.g., the vicissitudes in the journey of a nuclear submarine (the ‘actor’, ‘subject’ or ‘personage’) or the various avatars in the life of a commercial product.

• Even if a large amount of nonfictional narratives are embodied within natural language (NL) texts, this is not necessarily true: narrative information is really ‘multimedia . A photo representing a situation that, verbalized, could be expressed as “The US President is addressing the Congress” is not of course an NL document, yet it surely represents a narrative.

An in-depth analysis of the existing Knowledge Representation solutions that couldb e used to represent and manage nonfictional narratives endowed with the above characteristics is beyond the possibilities of this article – see in this context, e.g., (Zarri, 2005). We will limit ourselves, here, to some quick consideration.

We can note, first of all, that the now so popular Semantic Web (W3C) languages like RDF (Resource Description Framework), see (Manola and Miller, 2004), and OWL (Web Ontology Language), see (McGuinness and Harmelen, 2004) are unable to fit the bill because their core formalism consists in practice of the classical ‘attribute – value’model. For these ‘binary’ languages then, a property can only be a binary relationship, linking two individuals or an individual and a value. When these languages must represent simple ‘narratives’ like “John has given a topic to Mary”, several difficulties arise. In this extremely simple sentence, e.g., “give” is an n-ary (ternary) relationship that, to be represented in a complete way, asks for the presence of a specific ‘semanticpredicate’ in the “give” or “transfer” style, where the ‘arguments , “John”, “topic” and “Mary”, of the predicate must be labelled with ‘conceptual roles’ such as, e.g., ‘agent of give’, ‘object of give’ and ‘beneficiary of give’ respectively.

Efforts for extending the W3C languages by introducing some n-ary feature have been not very succe ssful until now: see, in this context, a recent working paper from the W3C Semantic Web Best Practices and Deployment Working Group (SWBPD WG) about “Defining N-ary Relations on the Semantic Web” (Noy and Rector, 2006). This paper proposes some extensions to the binary paradigm to allow the correct representation of ‘narratives’ like: “Steve has temperature, which is high, but failing” or “United Airlines flight 3177 visits the following airports: LAX, DFW, and JFK”. The technical solutions expounded in this paper are not very convincing and have aroused several criticisms. These have focused, mainly, on i) the fact that the majority of the solutions proposed do not deal, in reality, with the n-ary problem, but with (only loosely) related matters like the possibility of specifying a ‘standard’ binary relationship via the addition of properties, and ii) on the arbitrary introduction, through reification processes, of fictitious (and inevitably ad hoc) ‘individuals ‘ to represent the n-ary relations when these are actually dealt with. Moreover, the paper say nothing, e.g., about the way of dealing, in concrete ‘narrative’ situations, with those crucial ‘connectivity phenomena’ like causality, goal, indirect speech, co-ordination and subordination etc. that link together the basic pieces of information – e.g., the ‘basic events’ corresponding to the present illness state of Steve with other ‘basic events’ corresponding to the (possible or definite) ’causes’ of such state.

Several solutions for representing narratives in computer-usable ways according to some sort of actual ‘n-ary model’ have been described in the literature. For example, in the context of his work – between the mid-fifties and the mid-sixties – on the set up of a mechanical translation process based on the simulation of the thought processes of the translator, Silvio Ceccato (Ceccato, 1961) proposed a representation of narrative-like sentences as a network of triadic structures (‘correlations’) organized around specific ‘correlators’ (a sort of roles). Ceccato is also credited to be one of the pioneers of the semantic network studies; basically, semantic networks are directed graphs (digraphs) where the nodes represent concepts, and the arcs different kinds of associative links, not only the ‘classical’ IsA and property-value links, but also n-ary relationships. A panorama of the different conceptual solutions proposed in a semantic network context can be found in (Lehmann, 1992).

In the seventies, a sort of particularly popular, n-ary semantic network approach has been represented by the Conceptual Dependency theory of Roger Schank (Schank, 1973). In this theory, the underlying meaning (‘conceptualization’) of narrative-like utterances is expressed as combinations of’semantic predicates ‘chosen from a set of twelve ‘primitive actions’ (like INGEST, MOVE, ATRANS, the transfer of an abstract relationship like possession, ownership and control, PTRANS, physical transfer, etc.) plus states and changes of states, and seven role relationships (‘conceptual case). Conceptual Graphs (CGs) is the representation system developed by John Sowa (Sowa, 1984, 1999) and derived, at least partly, from Schank’s work and other early work in the Semantic Networks domain. CGs make use of a graph-based notation for representing ‘concept-types’ (organized into a type-hierarchy), ‘concepts’ (that are instantiations of concept types) and ‘conceptual relations’ that relate one concept to another. CGs can be used to represents in a formal way narratives like “A pretty lady is dancing gracefully” and more complex, second-order constructions like contexts, wishes and beliefs. CYC, see (Lenat et al., 1990) concerns one of the most controversial endeavours in the history of Artificial Intelligence. Started in the early ’80 as a MCC (Microelectronics and Computer Technology Corporation, Texas, USA) project, it ended about 15 years later with the set up of an enormous knowledge base containing about a million ofhand-entered ‘logical assertions’ including both simple statements of facts and rules about what conclusions can be inferred if certain statements of facts are satisfied.We can also mention here another ‘modern’ system, Topic Maps, see (Rath, 2003), where information is represented using topics (representing any concept, from people to software modules and events), associations (the relationships between them), and occurrences (the relationships between topics and information resources relevant to them). They correspond, eventually, to a sort of downgraded Semantic Network representation.

Leaving now aside ‘historical’ solutions like those proposed by Schank or Ceccato, none of the existingn-ary solutions mentioned above seem to be able to satisfy completely the nonfictional narratives requirements, see again (Zarri, 2005) for more details. The universal purposes of CYC, the extremely large dimensions of its knowledge base and the extreme diversity of the contents of this base give rise to serious consistency problems, that have apparently restricted the development of concrete applications based on this technology to experimental projects mainly supported by the US Government. On the other hand, the knowledge representation language of CYC, CycL (substantially, a frame system rewritten in logical form) seems to be too rigid and uniform to adapt itself to the representation of all the different facets (from general concepts and elementary events to the connectivity phenomena etc.) that characterise the narratives. C onceptual Graphs (CGs) could represent, at least in principle, a valid solution for dealing with nonfictional narrative information. However, it seems evident that work in a CGs context concerns mainly, with few exceptions, the ‘academic’ domain, and that the practically-oriented applications of CGs are particularly scarce. This becomes particular evident when we consider that the CGs developers still lack of an exhaustive and authoritative list of standard CGs structures under the form of ‘canonical graphs’ that could constitute a sort of ‘catalogue’ for dealing with practical problems; the set up of a tool like this seems never have been planned. The existence of such a catalogue could be extremely important for practical applications in the narrative (not only) domain given that: i) a system-builder should not have to create himselfthe structural and inferential knowledge needed to describe and exploit the events proper to a (sufficiently) large class of narratives; ii) the reproduction and the sharing of previous results could become neatly easier.

We can add to the above difficulties the existence of a series of general problems that are not associated with a specific system but that concern by and large all the existing n-ary solutions, like the lack of agreement about the list of ‘roles’ (conceptual cases) to be used when a narrative must be practically represented into conceptual format, or the differences of opinion about the use of ‘primitives’.

ACTUAL TRENDS

In spite of the quite pessimistic considerations of the previous Section, conceiving a specific Knowledge Representation tool for dealing in practice with nonfictional narrative information is far from being impossible. Returning now to the “John gave a topic…” example above – and leaving aside, for the moment being, all the additional problems linked, e.g., with the existence of the ‘connectivity phenomena’ – it is not too difficult to see that a complete, n-ary representation that captures all the ‘essentialmeaning of this elementary narrative amounts to:

• Define JOHN_, MARY_ and topic_1 as ‘individuals’, instances of general ‘concepts’ like human_being and information_support or of more specific concepts. Concepts and instances (individuals) are, as usual, collected into a ‘binary’ ontology (built up using a standard tool like, e.g., Protege).

• Define an n-ary structure organised around a conceptual predicate like, e.g., MOVE or PHYSI-CAL_TRANSFER and associate the above individuals (the arguments) to the predicate through the use of conceptual roles that specify their ‘ function’ within the global narrative. JOHN_ will then be introduced by an AGENT (or SUBJE CT) role, topic_1 by an OBJECT (or PATIENT) role, MARY_ by a BENEFICIARY role. An additional information like “yesterday” could be introduced by, e.g., a TEMPORAL_ANCHOR role, etc.

• ‘Reify’ the obtained n-ary structured associating with it an unique identifier under the form of a ‘semantic label’, to assure both i) the logical-semantic coherence of the structure; ii) an rational and efficient way of storing and retrieving it.

Formally, an n-ary structure defined according the above guidelines can be described as:

where L. is the symbolic label identifying the particular n-ary structure (e.g., the global structure corresponding to the representation of the “John gave a topic.” example), P. is the conceptual predicate, Rk is the generic role and ak the corresponding argument (e.g., the individuals john_, mary_ etc.). Note that each of the (Ry a) cells of (1), taken individually, represents a binary relationship in the W3C language style. The main point here is, however, that the whole conceptual structure represented by (1) must be considered globally.

The solution represented formally by (1) is at the core of a complete and running conceptual tools for the representation and management of nonfictional narrative information called NKRL (Narrative Knowledge representation Language), see (Zarri, 2005) and the companion article: ‘Narrative’ Information, the NKRL Solution.

CONCLUSION

We deal in this article with ‘nonfictional narratives’. These are information resources of a high economical importance that concern, e.g., the ‘corporate knowledge’ documents, the news stories, the medical records, the surveillance videos or visitor logs, etc. When we examine the existing (or past) general Knowledge Representation systems that could be used for dealing with nonfictional narratives, we can note that none of them seem to be able to satisfy completely the non-fictional narratives requirements. For example, the

W3C (Semantic Web) languages like RDF and OWL cannot fit the bill since they are binary-based types of representation while narratives ask, in general, for n-ary solutions. A specific, narrative-oriented formalism able to capture the essential ‘meaning’ of an ‘elementary’ narrative event however exists, see (Zarri, 2005) and the companion article: ‘Narrative’ Information, the NKRL Solution.

KEY TERMS

‘Binary’ Languages vs. o-ary Languages: Binary languages (like RDF and OWL) are based on the classical ‘attribute – value’ model: they are called ‘binary’ because, for them, a property can only be a binary relationship, linking two individuals or an individual and a value. They cannot be used to represent in an accurate way the narratives that ask in general, on the contrary, for the use of n-ary knowledge representation languages.

Connectivity Phenomena: In the presence of several, logically linked elementary events, this term denotes the existence of a global ‘narrative’ information content that goes beyond the simple addition of the information conveyed by the single events. The connectivity phenomena are linked with the presence of logico-semantic relationships like causality, goal, co-ordination and subordination etc.

Core Format of a Complete Solution for Representing Narratives: Formally, an n-ary structure able to represent the ‘essential meaning’ of an ‘elementary event’ can be described as:

where L. is the symbolic label identifying the particular formalized event, P. is the conceptual predicate, Rk is the generic role and ak the corresponding argument.

Examples of o-ary Languages: ‘Historical’ examples of n-ary languages are Ceccato’s ‘correlations’, Schank’s Conceptual Dependency theory, many Semantic Networks proposals, etc. Current n-ary systems are, e.g., Topic Maps, Sowa’s Conceptual Graphs, Lenat’s CYC, etc. None of them are able to satisfy completely the requirements for an ‘intelligent’ representation and management of nonfictional narrative information.

Narrative Information: Concerns in general the account of some real-life or fictional story (a ‘narrative’) involving concrete or imaginary ‘personages’.

Narratology: Discipline that deals with narratives from a theoretical point of view. Sub-classes of narratology that have a ‘computational’ interest are, e.g., Storytelling, Narrative Intelligence and the eChronicle systems.

Nonfictional Narrative of an Economic Interest: In this case, the personages are ‘real characters’, and the narrative happens in the real world. Moreover, the narratives are now embodied in multimedia documents of an economic interest: corporate memory documents, news stories, normative and legal texts, medical records, intelligence messages, surveillance videos or visitor logs, etc.