Actor Identification in Implicit Relational Data Sources - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

Fig. 1 Network Inference Process

the correct identification of network actors. This is a non-trivial process when the

concept of “actor” isn't at the forefront of the system design which is collecting

non-relational data. As such, this step involves identifying all the cases where actors

are not properly represented, typically when appearing under different identifiers in

different records. This task is critical if there is no unique identifier that identifies

each actor unambiguously.

Network actor identification is a reformulation of the “entity resolution” problem

that is frequently encountered in different areas of computer science (see section 3).

Entity Resolution approaches can be divided into two categories, namely attribute

based approaches and relational based approaches. Attribute based approaches, dis-

cussed in section 4.1, consider all the data elements independently and do not exploit

relationships, whether present or not, between data elements. Relational approaches,

discussed in section 4.2 use an identified network structure as additional information

to improve the quality of the entity resolution.

When inferring a network, relationships are not always trivial to infer. Ambigu-

ous definitions of relationships, different types of relationships, different measures

of relationship strength [43, 46] and lack of concrete supporting evidence in the

data, can make the process of relationship identification complex. Furthermore, if

relationship data is not available then relational entity resolution techniques cannot

be employed as there is no network data available.

In the network inference framework illustrated in Figure 1 we propose a cyclic

process whereby actors are first resolved using attribute based entity resolution and

then improved upon following the initial relationship identification stage. The cy-

cle between identifying actors and identifying relationships can be refined progres-

sively, in both directions. The relationship information can be used to improve the

quality of matching identical entities, while the observations from actor identifica-

tion can prompt rules to identify new types of relationships. In the second part of

this chapter, in section 5, we use a real world case study to illustrate the steps within

the first stage of actor identification using attribute information. Future work will

describe how relationships are identified from the data set and how this information

can be fed back to improve the quality of the actor identification process.

Search WWH ::

Custom Search

Home