Geoscience Reference
In-Depth Information
Table 13.1
Call data record (CDR) variables with original data fields ( top )
Original
table
Caller
location (x, y)
Receiver
location (x, y)
Caller
Receiver
Start time
Duration
Social
network
User 1
User 2
-
-
Number
of calls
Total duration
between users
1and2
Spatial
patterns
User 1
Location (x, y)
Location (x, y)
Duration
A social network and spatial data summary table are listed in the middle and bottom rows,
respectively
is undirected in order to reflect each member's inclination to participate in the
conversation regardless of the initiator (Calabrese et al. 2011). In other words,
records showing that A calls B, or B calls A, are summed to represent a connection
between unique, undirected pair A, B. Each pair must have either 10 C mutual
phone calls or 10 C min of total call duration in the given month to be considered
friends. This process eliminates non-friend calls such as sales calls, as these do not
represent persistent relationships. Our resultant dataset has an average of 11.55 calls
per friendship connection (with a 95 % confidence interval (c.i.) of [11.35, 11.76])
and an average of 12.52 min for each link (95 % c.i. is [12.22, 12.81]).
The spatial patterns table contains the locations of each user, which are combined
to geo-locate a pair of callers in the social network. The coordinates of the cell phone
tower where a user places or receives a call are summed and weighted by the number
of calls the user places or receives at that cell tower location. We use the resulting
set of weighted locations to represent the user's geographic activity pattern (such as
Carrasco et al. 2006 ), which are known to capture “anchor points” (Golledge 1999 )
such as home and workplace (or school), as they are the most visited locations for
the average traveler and, thus, frequent calling points (Schönfelder and Axhausen
2003 ).
13.2.2
Sampling
We sample the large CDR dataset by selecting a random sample of 150 “seed” users
and retrieve their contacts (first-degree ties), second-degree and third-degree ties,
in a method similar to Kurant et al. ( 2011 ). The number of seed users is calibrated
based on our ability to visualize and computationally analyze the resultant dataset.
We also choose this method over a random sample of all users (e.g., choosing 20,000
random users and the possible network that might form between them) because the
seed method ensures that retrieved nodes have connections (since we select friends,
then friends of friends). This method also is able to find groups, whereas in a random
sample of the network, nodes may not be connected. This configuration yields a
network that is focused on the social interactions of a small sample of users. As
 
Search WWH ::




Custom Search