Environmental Engineering Reference
In-Depth Information
(Box 2) separately, then doing the same with the output of the meteorological and
transport and chemistry module (Boxes 3 and 4) in an air quality modelling
system and then comparing the observed and modelled clusters, a synoptic view of
the performance of each module can be quantified (Boxes 7 and 8). By further
determining the correlation between the observed meteorological and air quality
clusters (Box 5), one gets an understanding of the degree to which air quality is
determined by meteorology. Repeating the same analyses of cluster correlation for
the two corresponding simulation modules (Box 6) would determine if the model
can mimick this aspect of the air pollution phenomenon. All these statistical
information (Lower right box) would together give a first indication of where
performance improvements are needed and determine whether the modelling
system, after incorporating some 'intended improvements', has moved in the
desired direction.
2. Assumptions and Methodology Development
The spatial extent of this analysis cannot be too large for the basic homogeniety
assumption to hold, i.e. for the weather parameters to be meaningfully clustered,
the area should be influenced by the same synoptic meteorological feature
'simultaneously', hence an area of less than 100 km each way would be appropriate.
Hourly data would serve as basic input, but given the spatial scale considered and
the diurnal solar forcing, the hourly data from all the monitoring stations within
the analysed area are averaged over a day and served as the basic input for clustering.
Given the relatively large number of parameters involved in both meteorology and
air quality (over 10 each), a long enough period (5 years or more) should be used.
The stations were judiciously chosen after inspecting the data. Then standard
cleansing and imputation of missing and dubious data were performed on the data
set.
Two issues are to be addressed for basic clustering: (1) the best clustering
method, and (2) the optimum number of clusters (the number of clusters used in
each of culster boxes - 1-4 in Fig. 1 - can be determined independently, hence
different). The best clustering approach is determined by trying different methods
(hierarchical, K-means, hybrid of the preceding) and judging their performace by
a 'correlation coefficient between actual and ideal similarity matrices' (Tan et al.,
p 542-543, http://www-users.cs.umn.edu/~kumar/dmbook/index.php) and the
silhouette coefficient based on cluster cohension and separation (Tan et al., p 536-
541). The K-means method was determined to have produced the best clustering
results and is adopted for all clustering. The upper limit of cluster number is
guided by subjective understanding of weather and air quality patterns which
suggests a number less than 10 or so. The optimum number of cluster is then
determined from the inflection point in the sum of squared distance (total distance
of all points in parameter space from respective cluster centres) versus cluster
number plots.
Search WWH ::




Custom Search