Actor Identification in Implicit Relational Data Sources - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

relationships are folded into the simple category of “friend” which misses the rich

context we know exists. Data cleanliness and reliability is further a major concern

when such systems are used to support gaming where unknown people are added as

friends, to benefit in online games that reward a high number of relationships. Re-

lationship simplification, misuse or abuse of such systems can all result in network

data with skewed properties which are not representative of the either the real-world

or even active online social activities.

Although attribute based analytical techniques assume independence between in-

dividual rows of data, in fact attribute heavy data sets can contain relationships

that may be beneficial to analyse. When analysing attribute data together with a

relational perspective a new dimension is added to the analysis which can provide

valuable insight. For example, customer analysis is a traditionally attribute heavy

domain where most analysis is performed with traditional attribute based analysis

methods [11]. In this domain however, it has been recognised that a customer rarely

behaves independently and it is advantageous to consider a relational perspective of

the customer [35, 27, 31] ie. a customer has friends, family, partner and colleagues

who may also be customers.

The immediate advantage of inferring social networks from attribute based sys-

tems is manifested in the increased scale and time span of the data. When relying on

manual data collection the size of the data collected is typically limited to at most

a few hundred actors. When networks are inferred from large automatically created

data sets, the number of nodes can easily span thousands or millions of actors. The

number of time points in manually created data sets is also limited by the feasible

number of times a survey can be conducted. When using automatically collected

data that is timestamped, extracting large dynamic networks becomes possible.

An additional advantage that comes with automatic network extraction is the

elimination of self report bias when actors respond to network surveys. The bias

can be introduced by a lack in the ability to remember instances, one's personal

understanding of the relationship terms in the survey, and the reliance on the good

will of the participant to supply accurate results [29]. The derived benefit however

comes at a cost, as while intentional bias may be eliminated, automatically collected

large scale data can be dirty and require various degrees of sanitisation.

While manual data collection might under report, automatic data collection pro-

cesses may over report on relations. Automatic data collection processes are de-

signed to be comprehensive and catch all instances of any event that occur. These

events might include ones that are not relevant for data extraction. For example, in

the case of a phone call network, a person may call their mailbox to retrieve mes-

sages or call their own phone to locate it when lost. While these are both valid calls

and are recorded in call log, they are not significant to an extracted social network.

1.2

Network Inference Approach

In order to be able to infer social networks two artifacts must be identified, the

actors and the relationships between the actors. In this chapter we concentrate on

Mining and Analyzing Social Networks

Search WWH ::

Custom Search

Home