Information Technology Reference
In-Depth Information
related ( P18 . Richter et al. 2012 ). Hence, just as in data mining in general, movement
mining is challenged by noisy and uncertain, multi-source data.
Miller and Han ( 2009 ) furthermore stress that data mining and KDD is more
inductive than traditional deductive statistical analysis. Statistics aim to confirm a
priori formulated hypotheses, based on some theory. By contrast, the patterns and
relations hidden in large data sources sought in a data mining and KDD process are by
definition unexpected and unknown in advance. At the least, it would be very difficult
to have a complete apriori picture of what to find. Hence data mining is most useful
when applied early in the process of scientific discovery, when structuring large
data sources, aiming in an exploratory way towards the establishment of a theory.
This way of scientific reasoning is very much prevalent in the movement mining
literature, especially when combined with scientific visualization in visual analytics
(Andrienko and Andrienko 2007 ).
Summarizing why data mining as a technique is especially suitable for CMA,
Table 3.1 compares the peculiarities of movement data identified in Sect. 1.1 with
the arguments motivating data mining or conventional statistics. Besides an arguably
good general fit, the comparison also reveals commonalities regarding especially the
type of the investigated data (noisy, uncertain) and the type of patterns of interest
(non-trivial, unexpected complex relations).
3.2 Movement Mining Tasks
There are many different, yet similar, categorizations of data mining tasks (see for
example Chakrabarti et al. 2006 ; Hand et al. 2001 ; Han and Kamber 2006 , for an
Table 3.1 Comparison of the properties of movement data and the strengths of data mining as a
tool, contrasted to the characteristics of conventional database querying and statistics
Peculiarities of
Motivation for data mining Motivation for conventional DB
movement data (Sect. 1.1 )
queries and confirmatory statistics
Geographic reference
systems
-
-
Permanent change
-
-
Complex objects
Non-trivial relationships
Simple data points in spreadsheets
Implicit relations
Discover the unexpected
Confirmatory, confirm the expected
Overlap
Multi-source, ad hoc
integrated data sources
Solitary spreadsheets
Spatial dependency and
heterogeneity
Spatio-temporally
autocorrelated
Independence, normality
Uncertainty
Noisy and uncertain data
Clean, noiseless
Derivative data
Multi-source, ad hoc
integrated data sources
Data sampled for primary scientific
question
Scale issues
-
-
 
 
Search WWH ::




Custom Search