Database Reference
In-Depth Information
Essentially, this language provides a small set of built-in primitives such as
smiles_file
for reading a data file,
minimum_frequency
for specifying a
minimum support constraint and
maximum_frequency
for specifying a max-
imum support constraint. For each of these primitives, the system is aware of
the properties such as (anti-)monotonicity, which ensures that any conjunction
or disjunction of constraints that is written is down can be processed by the
system.
Similar special purpose languages were proposed by several other authors [
22
,
29
];
they differ in the constraints that are supported and the type of patterns that can be
found (itemsets [
22
,
29
], strings [
12
,
19
],
...
).
Languages built on SQL
A clear disadvantage of special purpose languages is
that they are yet additional languages that the programmer has to learn. Given that
many datasets are stored in databases, several projects have studied the integration
of constraint-based pattern mining in database systems.
The first class of such methods aims to extend SQL with additional syntax for the
formalization of data mining tasks. One early example is the
MINE RULE
operator
[
21
]:
This example mines association rules with minimum support 0
.
1, confidence 0
.
2,
limiting the search to items with a price lower than $ 150, a succinct constraint.
Another example is the DMQL language [
15
]:
In this example we search for association rules related to three specific products, in
those transactions that have a value higher than 100; the parameters of the association
rule discovery process are similar to the previous example. A third example is SPQL
[
7
].
The advantage of these languages is that well-known syntax can be used for the
expression for constraints. Furthermore, common SQL syntax can be used to specify
the input of the mining task or to process its output further.
At the same time, the programmer still has to learn the additional primitives,
such as the
FIND
or
MINE RULE
keywords. An alternative perspective is to avoid
extending the language, but to add
mining views
to a database [
4
]. They are virtual