Information Technology Reference
In-Depth Information
relationships are folded into the simple category of “friend” which misses the rich
context we know exists. Data cleanliness and reliability is further a major concern
when such systems are used to support gaming where unknown people are added as
friends, to benefit in online games that reward a high number of relationships. Re-
lationship simplification, misuse or abuse of such systems can all result in network
data with skewed properties which are not representative of the either the real-world
or even active online social activities.
Although attribute based analytical techniques assume independence between in-
dividual rows of data, in fact attribute heavy data sets can contain relationships
that may be beneficial to analyse. When analysing attribute data together with a
relational perspective a new dimension is added to the analysis which can provide
valuable insight. For example, customer analysis is a traditionally attribute heavy
domain where most analysis is performed with traditional attribute based analysis
methods [11]. In this domain however, it has been recognised that a customer rarely
behaves independently and it is advantageous to consider a relational perspective of
the customer [35, 27, 31] ie. a customer has friends, family, partner and colleagues
who may also be customers.
The immediate advantage of inferring social networks from attribute based sys-
tems is manifested in the increased scale and time span of the data. When relying on
manual data collection the size of the data collected is typically limited to at most
a few hundred actors. When networks are inferred from large automatically created
data sets, the number of nodes can easily span thousands or millions of actors. The
number of time points in manually created data sets is also limited by the feasible
number of times a survey can be conducted. When using automatically collected
data that is timestamped, extracting large dynamic networks becomes possible.
An additional advantage that comes with automatic network extraction is the
elimination of self report bias when actors respond to network surveys. The bias
can be introduced by a lack in the ability to remember instances, one's personal
understanding of the relationship terms in the survey, and the reliance on the good
will of the participant to supply accurate results [29]. The derived benefit however
comes at a cost, as while intentional bias may be eliminated, automatically collected
large scale data can be dirty and require various degrees of sanitisation.
While manual data collection might under report, automatic data collection pro-
cesses may over report on relations. Automatic data collection processes are de-
signed to be comprehensive and catch all instances of any event that occur. These
events might include ones that are not relevant for data extraction. For example, in
the case of a phone call network, a person may call their mailbox to retrieve mes-
sages or call their own phone to locate it when lost. While these are both valid calls
and are recorded in call log, they are not significant to an extracted social network.
1.2
Network Inference Approach
In order to be able to infer social networks two artifacts must be identified, the
actors and the relationships between the actors. In this chapter we concentrate on
 
Search WWH ::




Custom Search