The Dempster-Shafer Theory (Artificial Intelligence)

INTRODUCTION

The initial work introducing Dempster-Shafer (D-S) theory is found in Dempster (1967) and Shafer (1976). Since its introduction the very name causes confusion, a more general term often used is belief functions (both used intermittently here). Nguyen (1978) points out, soon after its introduction, that the rudiments of D-S theory can be considered through distributions of random sets. More furtive comparison has been with the traditional Bayesian theory, where D-S theory has been considered a generalisation of it (Schubert, 1994). Cobb and Shenoy (2003) direct its attention to the comparison of D-S theory and the Bayesian for-mulisation. Their conclusions are that they have the same expressive power, but that one technique cannot simply take the role of the other.

The association with artificial intelligence (AI) is clearly outlined in Smets (1990), who at the time, acknowledged the AI community has started to show interest for what they call the Dempster-Shafer model. It is of interest that even then, they highlight that there is confusion on what type of version of D-S theory is considered. D-S theory was employed in an event driven integration reasoning scheme in Xia etal. (1997), associated with automated route planning, which they view as a very important branch in applications of AI. Liu (1999) investigated Gaussian belief functions and specifically considered their proposed computation scheme and its potential usage in AI and statistics. Huang and Lees (2005) apply a D-S theory model in natural-resource classification, comparing with it with two other AI models.


Wadsworth and Hall (2007) considered D-S theory in a combination with other techniques to investigate site-specific critical loads for conservation agencies. Pertinently, they outline its positioning with respect to AI (p. 400);

The approach was developed in the AI (artificial intelligence) community in an attempt to develop systems that could reason in a more human manner and particularly the ability of human experts to “diagnose” situations with limited information.

This statement is pertinent here, since emphasis within the examples later given is more towards the general human decision making problem and the handling of ignorance in AI. Dempster and Kong (1988) investigated how D-S theory fits in with being an artificial analogy for human reasoning under uncertainty.

An example problem is considered, the murder of Mr. White, where witness evidence is used to classify the belief in the identification of an assassin from considered suspects. The numerical analyses presented exposit a role played by D-S theory, including the different ways it can act on incomplete knowledge.

BACKGROUND

The background section to this article covers the basic formulisations of D-S theory, as well as certain developments. Formally, D-S theory is based on a finite set of p elements © = {s1, s2, …, sp}, called a frame of discernment. A mass value is a function m: 2© ^ [0, 1] such that m(0) = 0 (0 – the empty set) and:

tmp7E59_thumb

(2© – the power set of ©). Any proper subset s of the frame of discernment ©, for which m(s) is non-zero, is called a focal element and represents the exact belief in the proposition depicted by s. The notion of a proposition here being the collection of the hypotheses represented by the elements in a focal element.

In the original formulisation of D-S theory, from a single piece of evidence all assigned mass values sum to unity and there is no belief in the empty set. In the case of the Transferable Belief Model (TBM), a fundamental development on the original D-S theory (see Smets and Kennes, 1994), a non-zero mass value can be assigned to the empty set allowing m(0) > 0. The set of mass values associated with a single piece of evidence is called a body of evidence (BOE), often denoted m(-). The mass value m(©) assigned to the frame of discernment © is considered the amount of ignorance within the BOE, since it represents the level of exact belief that cannot be discerned to any proper subsets of ©.

D-S theory also provides a method to combine the BOE from different pieces of evidence, using Dempster’s rule of combination. This rule assumes these pieces of evidence are independent, then the function (m1 0 m2): 2© ^ [0, 1], defined by:

tmp7E60_thumb

is a mass value, where s1 and s2 are focal elements from the BOEs, m1 (•) and m2 (•), respectively. The denominator part of the combination expression includes:

tmp7E61_thumb

that measures the level of conflict in the combination process (Murphy, 2000). It is the existence of the denominator part in this combination rule that separates D-S theory (includes it) from TBM (excludes it). Benouhiba and Nigro (2006) view this difference as whether considering the conflict mass:

tmp7E62_thumb

as a further form of ignorance mass is an acceptable point of view.

D-S theory, along with TBM, also differs to the Bayesian approach in that it does not necessarily produce final results. Moreover, partial answers are present in the final BOE produced (through the combination of evidence), including focal elements with more than one element, unlike the Bayesian approach where probabilities on only individual elements would be accrued. This restriction of the Bayesian approach to consider singleton elements is clearly understood through the ‘Principle of insufficient Reason’, see Beynon et al. (2000) and Beynon (2002, 2005).

To enable final results to be created with D-S theory, a number of concomitant functions exist with D-S theory, including;

i) The Belief function,

tmp7E63_thumb

for all s. c ©, representing the confidence that a proposition y lies in s. or any subset of s, ii) The Plausibility function,

tmp7E64_thumb

for all s c ©, represents the extent to which we fail to disbelieve s,

iii) The Pignistic function (see Smets and Kennes, 1994),

tmp7E65_thumb

for all s c ©, represents the extent to which we fail to disbelieve s.

From the definitions given above, the Belief function is cautious of the ignorance incumbent in the evidence, where as the Plausibility function is more inclusive of its presence. The Pignistic function acts more like a probability function, partitioning levels of exact belief (mass) amongst the elements of the focal element it is associated with.

A non-specificity measure N(m(-)) within D-S theory was introduced by Dubois and Prade (1985), the formula is defined as,

tmp7E66_thumb

where |s| is the number of elements in the focal element s Hence, N(m(-)) is considered the weighted average of the focal elements, with m(-) the degree of evidence focusing on s, while log2|s?.| indicates the lack of specificity of this evidential claim. The general range of this measure is [0, log2|©|] (given in Klir and Wierman, 1998), where |©| is the number of elements in the frame of discernment ©.

Main Thrust

The main thrust of this article is an exposition of the utilisation of D-S theory. The small example problem considered here relates to the assassination of Mr White, many derivatives of this example exist. An adaptation of a version of this problem given in Smets (1990) is discussed, more numerical based here, which allows interpretation with D-S theory and its development TBM to be made.

There are three individuals who are suspects for the murder of Mr. White, namely, Henry, Tom and Sarah, within D-S theory they make up the frame of discernment, © = {Henry, Tom, Sarah}. There are two witnesses who have information regarding the murder of Mr. White;

Witness 1, is 80% sure that the murderer was a man, it follows, the concomitant body of evidence (BOE), defined m (•), includes m({Henry, Tom}) = 0.8. Since we know nothing about the remaining mass value it is considered ignorance, and allocated to &, hence m ({Henry, Tom, Sarah}) = 0.2 (= m (&)).

Witness 2, is 60% confident that Henry was leaving on a jet plane when the murder occurred, so a BOE defined includes, m({Tom, Sarah}) = 0.6 and m/{Henry, Tom, Sarah}) = 0.4.

The aggregation of these two sources of information (evidence from the two witnesses), using Dempster’s combination rule (1), is based on the intersection and multiplication of the focal elements and mass values from the BOEs, m1(-) and m2(-), see Table 1.

Table 1. Intermediate combination of BOEs, m (•) and mf)

m1() 1 m20 {Tom, Sarah}, 0.6 ©, 0.4
{Henry, Tom}, 0.8 {Tom}, 0.48 {Henry, Tom}, 0.32
©, 0.2 {Tom, Sarah}, 0.12 ©, 0.08

In Table 1, the intersection and multiplication of the focal elements and mass values from the BOEs, m1 (•) and m2(-) are presented. The new focal elements found are all non-empty, it follows, the level of conflict

tmp7E67_thumb

then the resultant BOE, defined m3(-), can be taken directly from the results in Table 1;

m3({Tom}) = 0.48, m3({Henry, Tom}) = 0.32,

m3({Tom, Sarah}) = 0.12

and m3({Henry, Tom, Sarah}) = 0.08.

Amongst this combination of evidence (m3(-)), the mass value assigned to ignorance (m3({Henry, Tom, Sarah}) = 0.08) is less than that present in the original constituent BOEs, as expected when combining evidence using D-S theory. To further exposit the effect of the combination of evidence, the respective non-specificity values associated with BOEs shown here are calculated. For the two witnesses, with their BOEs, m() and m2(-);

tmp7E68_thumb

and N(m2(-)) = 1.234. The non-specificity associated with the combined is similarly calculated, found to be N(m3(-)) = 0.567. The values further demonstrate the effect of the combination process, namely a level of concomitant non-specificity associated with the BOE m3(-), found from the combination of the other two BOEs m() and m2(-).

To allow a comparison of this combination process, D-S theory is used with the situation for TBM, the evidence from witness 2 is changed slightly, becoming;

Witness 2, is 60% confident that Henry and Tom were leaving on a jet plane when the murder occurred, so a BOE defined m(•) includes, m .({Sarah}) = 0.6 and m({Henry, Tom, Sarah}) = 0.4.

The difference between the two ‘Witness 2′ statements is that, in the second statement, now Tom is also considered to be leaving on the jet plane with Henry. The new intermediate calculations when combining the evidence from the two witnesses is shown in Table 2.

In the intermediate results in Table 2, there is an occasion where the intersection of two focal elements from m() and m2(-) results in an empty set (0). It follows,

tmp7E69_thumb

giving the value, 1 – 0.48 = 0.52, forms the denominator in the expression for the combination of this evidence (see (1)), so the resultant BOE, here defined m4(-), is; m4({Henry, Tom}) = 0.32/0.52 = 0.615, m4({Sarah}) = 0.231 and m4({Henry, Tom, Sarah}) = 0.154.

Comparison with the results in the BOEs, m3(-) and m4(-), show how the mass value associated with m3({Tom}) = 0.48 has been spread across the three focal elements which make up the m40 BOE.

Table 2. Intermediate combination of BOEs, m (•) and m(^), with the new ‘Witness 2′evidence

m^) 1 m20 {Sarah}, 0.6 ©, 0.4
{Henry, Tom}, 0.8 0, 0.48 {Henry, Tom}, 0.32
©, 0.2 {Sarah}, 0.12 ©, 0.08

This approach to counter the conflict possibly present when combining evidence is often viewed as not appropriate, with TBM introduced to offer a solution, hence using the second ‘Witness 2′ statement, the resultant combined BOE, defined m5 (•), is taken directly from Table 2;

tmp7E70_thumb

The difference between the BOEs, m4(-) and m5(-), is in the inclusion of the focal element m5(0) = 0.48, allowed when employing TBM. Beyond the difference in the calculations made between D-S theory and TBM, the important point is what is the interpretation to the m5(0) expression in TBM. Put succinctly, following Smets (1990), m5(0) = 0.48 corresponds to that amount of belief allocated to none of the three suspects, taken further it is the proposition that none of the three suspects is the murderer. Since the three individuals are only suspects, the murderer might be someone else, if the initial problem has said that one of the three individuals is the murderer then the D-S theory approach should be adhered to.

Returning to the analysis of the original witness statements, the partial results presented so far do not identify explicitly which suspect is most likely to have undertaken the murder of Mr. White. To achieve explicit results, the three measures, Bel(s^, Pls(s^) and BetP(s^ previously defined, are considered on singleton focal elements (si are individual suspects);

tmp7E71_thumbtmp7E72_thumb

In this small example, all three measures identify the suspect Tom as having the most evidence purporting to them being the murderer of Mr. White.

FUTURE TRENDS

Dempster-Shafer (D-S) theory is a methodology that offers an alternative, possibly developed generality, to the assignment of frequency-based probability to events, in its case levels of subjective belief. However, the issues surrounding its position with respect to other methodologies such as the more well known Bayesian approach could be viewed as stifling it utilisation. The important point to remember when considering D-S theory is that it is a general methodology that requires subsequent pertinent utilisation when deriving nascent techniques.

Future work needs to aid in finding the position of D-S theory relative to the other methodologies. That is, unlike methodologies like fuzzy set theory, D-S theory is not able to be employed straight on top of existing techniques, to create a D-S type derivative of the technique. Such derivatives, for example, could operate on incomplete data, including when there are missing values, their reason for missing possibly due to ignorance etc.

CONCLUSION

Dempster-Shafer (D-S) theory, and its general developments, continues to form the underlying structure to an increasing number of specific techniques that attempt to solve certain problems within the context of uncertain reasoning. As mentioned in the future trends section, the difficulty with D-S theory is that it needs to be considered at the start of work at creating a new technique for analysis. It follows, articles like this which show the rudimentary workings of D-S theory allow researchers the opportunity to see its operation, and so may contribute to its further utilisation.

KEY TERMS

Belief: In Dempster-Shafer theory, the level of representing the confidence that a proposition lies in a focal element or any subset of it.

Body of Evidence: In Dempster-Shafer theory, a series of focal elements and associated mass values.

Focal Element: In Dempster-Shafer theory, a set of hypotheses with positive mass value in a body of evidence.

Frame of Discernment: In Dempster-Shafer theory, the set of all hypotheses considered.

Dempster-Shafer Theory: General methodology, also known as the theory of belief functions, its rudiments are closely associated with uncertain reasoning.

Ignorance: In Dempster-Shafer theory, the level of mass value not discernible among the hypotheses.

Mass Value: In Dempster-Shafer theory, the level of exact belief in a focal element.

Non-Specificity: In Dempster-Shafer theory, the weighted average of the focal elements’ mass values in a body of evidence, viewed as a species of a higher uncertainty type, encapsulated by the term ambiguity.

Plausibility: In Dempster-Shafer theory, the extent to which we fail to disbelieve a proposition lies in a focal element.

Next post:

Previous post: