Information Technology Reference
In-Depth Information
2
Rational for Identifying Unique Actors
If each actor in a non-relational dataset has a unique identifier then the process of
actor identification is straightforward, apart from noisy or spurious entries. When,
as is often the case, no unique identifier exists then the process is more challeng-
ing. Hence, the actor identification process typically involves matching records with
similar personal information to the same person. Consider the 3 records with name,
address and telephone fields shown in Table 1. After a cursory glance, based on the
attributes given, one can infer that Thomas O'Connell and Tom Connell possibly re-
fer to the same person, but the last Thomas Connell probably isn't the same person
even though he has the same name as the person in record 1. Entity resolution is the
technique used to automate this process.
Ta b l e 1 Similar example records for entity recognition
Name
Address 1
Address 2 Phone
Thomas O'Connell 15 Parnell Street
Dublin 2
085 123 4233
Tom Connell
Parnell Square, 15 Dublin 2
(0)85 123 4233
Thomas O'Connell 15 High Street
Dublin 1
+353 85 458 1112
In a social network each node of the network is a unique member of the net-
work. If the same entity is present more than once in the network, then patterns and
measures calculated from the social network will be inaccurate [15]. If the above
example is extended to a social network, the importance of entity resolution is clear.
Consider a social network derived from e-mail communication between a group of
friends. Table 2 shows the original list of emails sent between the 6 friends, before
entity recognition is applied.
Ta b l e 2 Email communication between friends before entity recognition
From To
mary.jane tom.connell
mary james.home
michael.home maria
mike.work
james.work
Figure 2 shows the network of the relationships before entity recognition has been
applied. Notice that the network is fragmented, and is consists of 4 minimal compo-
nents. If through entity recognition we identify that the e-mail names michael.home
and mike.work are both referring to the same Michael, and james.home and
james.work are referring to the same James, then the network now looks like the
one shown in Figure 3.
Following the entity recognition stage the network changes drastically, reducing
the number of components by half. Entity recognition is rarely 100% precise and
 
Search WWH ::




Custom Search