Information Technology Reference
In-Depth Information
Definition 9:
We define a
Class of a Sequence
S, denoted S
C
,
as the cluster composed
of multidimensional sequences whose sequences are similar to S, where S is a
sequen-
tial pattern
.
In order to define characteristic rules, we adopt and formalize the definition given
in [9]. They define
characterization
of a sub-set as the property descriptions specific
to this sub-set, comparing to all objects in the database.
Definition 10:
We denote
se
a subset of the database DB,
prop
a multidimensional
property (a
i1
, …, a
ik
), freq
se
(
prop
) the number of objects in
se
that meet the property
prop
; and card(
se
) the cardinality of
se
. The
significance
of
prop
in the subset
se
is
defined as: F
DB
se
(
prop
) = (freq
se
(
prop
)/card(
se
)) / (freq
DB
(prop)/card(
DB
))
Definition 11:
Given a real
R
standing for the significance threshold.
prop
is said
characteristic of
se
, and denoted as:
prop
Î
se [significance],
if and only if:
F
DB
se
(
prop
)
= significance
≥
R
.
Definition 12:
Let S
c
be the class of a
sequential pattern
SP
c
. We define a
multidi-
mensional sequential rule
as:
prop
Î
SP
c
[
significance
].
This
multidimensional sequential rule
means that the multidimensional property
prop
is characteristic of the
sequential pattern
SP
c
with the computed
significance
.
The example of table 8.1 shows a
multidimensional sequence database
. The tuple
(1, <s
1
,s
2
, .., s
n
>, a
1
, …, a
m
) stands for a
multidimensional sequence
of the database.
Table 8.1.
A Multidimensional Sequence Database
…
S
A
1
A
2
…
RID
1
A
i
A
m
<s
1
, s
2
, .., s
i
, .., s
n
>
a
1
a
2
a
i
a
m
…
k
8.2.3 Description of the Use Case and Datasets
The target application is related to population time-use analysis and more precisely
their daily activities and displacements. This dataset describes daily activities and
displacements carried out by each person of a surveyed household at the scale of a
whole urban area. It can be seen as a sequence of activities, also called
activity pro-
gram
[24]. For example, during a day, an individual can leave home, drive children to
school, go to work, pick children up from school and come back to home. This
sequence can be described as (Home, School, Work, School, Home). In order to sim-
plify the notations, we represent each activity by a specific character, e. g. H for
Home, W for Work, and S for School. Other activities are Market (denoted M),
Restaurant (R), Leisure (L), etc. This alphabet can be as long as necessary. Then, by
removing the comma separators, a sequence could be simplified to a character string,
e.g. HSWSH for the previous sequence. Although we have used activity programs as
an example in our experiments, the analysis is also relevant for other sequences, such
as the transport mode used for displacements, the departure time, and so on.