Geography Reference
In-Depth Information
at least three tweets in the covered period of 3 months. This subset of the dataset is
supposed to increase the probability to work with tweets from residents, who spend
more time in the area and have more chances to post tweets.
Besides geolocated tweets, socio-demographic and population density data are
used for this study. The socio-demographic data for London
s wards, which are
subunits of boroughs, come from the Greater London Authority (GLA). They were
downloaded from the GLA data store for the year 2011. 3
The analysis of Twitter data is composed of several steps. First, the distribution
of tweets during day-time and night-time were compared in order to identify wards
with high tweet counts and geographic differences between the day and night
patterns.
The subsequent step focused on identifying tweet hotspots. Tweet hotspots are
locations with very large numbers of tweets in comparison to population figures.
The tweets used are night-time tweets, which are more likely sent from locations
were people are residents.
Having identified tweet hotspots, the analysis of socio-demographic variables
can proceed. For this purpose, the Exploratory Spatial Data Analysis (ESDA) tool
GeoDA 4 is used. The authors focused on a parallel coordinate plot in order to
visually inspect similarities between variable distributions across tweet hotspots.
This visual inspection of socio-demographic variables serves to identify candidate
variables for an Ordinary Least Squares (OLS) regression analysis.
The choice of variables for the OLS regression model is based on the charac-
teristics of Twitter users identified in a recent study (Mitchell and Page 2013 ) and
the inspection of variables and their distributions in GeoDA. The OLS model is
prepared for testing the statistical relationship between Twitter data and socio-
demographic variables. The regression model provides a generalized assessment of
the regression of the variables across the entire study area. Depending on the
significance of the identified relationships, a spatially-explicit regression analysis
can provide detailed insights. The OLS model therefore provides the basis for a
Geographical Weighted Regression (GWR) analysis.
'
Results and Discussion
This section introduces the results obtained by the exploratory spatial data analysis
and statistical analysis. Different forms of representations were used to explore and
portray the findings.
By splitting the tweets into day-time and night-time sets and eliminating tweets
of users sending less than three tweets, we obtained roughly 105.500 day-time
tweets and 106.00 night-time tweets.
3 Socio-demographic data— http://data.london.gov.uk/datastore/package/ward-profiles-and-atlas
(2013-12-16).
4 GeoDa ESDA tool— http://geodacenter.asu.edu/projects/opengeoda .
Search WWH ::




Custom Search