Database Reference
In-Depth Information
SQL
Front-end
XQuery
Front-end
SPARQL
Front-end
BAT Algebra
BAT 'name'
BAT 'age'
0
1
2
1907
1927
1927
John Wayne\0
Roger Moore\0
Bob Fosse\0
Will Smith\0
0
1
2
3
0
11
23
33
0
1
1
2
3
1968
Select (age, 1927)
(memory mapped) simple memory array
(virtual) dense surrogates
MonetDB back-end
Figure 7.2 MonetDB architecture: The front end translates queries into
BAT Algebra expressions; the back end executes the BAT plan.
reverse(bat[t1,t2] B) : bat[t2,t1]
= [ < B[i]. tail ,B[i ]. head >
|
i <
|
B
|
] ''swap columns''
mirror(bat[t1,t2] B) : bat[t1,t1]
= [ < B[i].head,B[i ]. head >
|
i <
|
B
|
] ''make tail equal to head''
mark(bat[t1,t2] B) : bat[t1,TID]
= [ < B[i].head,i > |
i < |
B
|
] ''number tail''
= [ < L[i].head,R[j ]. tail > |
i < |
|
,j < |
}
join (bat[t1,t2] L, bat[t2,t3] R) : bat[t1,t3]
L
R
, L[i]. tail =R[j].head]
'' inner join ''
uselect (bat[t1,t2] B, t2 v) : bat[t1,void]
= [ < B[i].head,nil > |
i < |
B
|
, B[I]. tail =v ] '' selection on tail ''
[+](bat[t1,t2] L, bat[t1,t2] R) : bat[t1,t2]
= [ < L[i].head,L[i ]. tail +R[i]. tail >
|
i <
|
L
|
,j <
|
R
|
,
L[i ]. head=R[i].head ]
'' map [op]''
group(bat[t1,t2] L, bat[t1,t2] R) : bat[t1,t2]
= [ < L[i].head,unique(L[i ]. tail ,R[j ]. tail ) >
|
i <
|
L
|
,j <
|
R
|
,
L[i ]. head=R[i].head ] '' groupby''
unique(bat[t1,t2] B) : bat[t1,t2]
=
{
< B[i].head,B[i]. tail >
|
i <
|
B
|}
''duplicate elimination ''
{
sum
}
(bat[t1,t2] B) : bat[t1,t2]
= [ < U[i].head,sum(select(reverse(L)),h.head) >
|
U= unique(mirror(B)) ] ''aggr
{
op
}
''
The BAT-algebra notation used above consists of an operation applied to
an operand on the left of the “:”, and the result in the right. For example, the
“join” operator is applied to left (L) and right (R) BATs, and for entries where
tail of L is equal to head of R, it generates a result with columns corresponding
to head of L and tail of R.
The reverse(), mirror(), and mark() operators all produce a result in which
at least one of the input columns appears unchanged. In the MonetDB im-
plementation, these operations have a constant-cost implementation that just
manipulates some information in the column descriptor, since the result BAT
shares the (large) array data-structures holding the column data with its input
BAT. The [op]() and
() are second-order operators that take an opera-
tor name op and construct for it a map operator (that works on the natural
join of all input BATs on head column) and a grouped aggregate function,
respectively.
{
op
}
Search WWH ::




Custom Search