Music Information Retrieval in P2P Networks - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

N . The goal is to sample U and L , so as to obtain

two sequences U ' and L ', each of length M < N .

Initially, it is assumed that uniform sampling is

performed. In this case, each time the ( i × N / M )-th

element of U and L is selected, where 1 ≤ i ≤ M .

When the LB Keogh between the query sequence

Q and a data sequence is computed, each phrase C

of length N in Q is considered. Thus, each phrase

has to be sampled in the same way as U and L .

This leads to a sampled phrase C '. Therefore, the

lower-bound measure B ', is given as:

that the sampling of U and L does not produce

any false negatives.

The separate sampling of U and L presents the

requirement of having to store the positions from

which elements are being selected in U ' and L '. If

the positions are stored explicitly, then this doubles

the amount of information kept (2M numbers for

storing U ' and L ' and additionally 2 M numbers for

storing the positions of selected elements). Since

this information is propagated during querying,

traffic is increased. For this reason an alternative

representation is proposed. To represent U ' , a

bitmap of length N (the phrase length) is used.

Each bit corresponds to an element in U . If an

element is selected in the sample U ', then its bit

is set to 1, otherwise it is set to 0. Therefore, the

combination of the bitmap and the M values that

are selected in U ' are used to represent U '. The

same applies for L '. This representation proves

to be efficient: the space required for U ' is M +





(

−

)



∑ = 

LB'

(

−

)

(2)

otherwise

In the aforementioned equation, the third case

(i.e., when L i ' ≤ C i ≤ U i ') does not contribute in

the computation of B ' . The problem of uniform

sampling is that, as it selects elements without

following any particular criterion, it tends to

select many elements from U and L that result

to this third case. Therefore, B ' may become a

significantly bad underestimation of B that would

have been computed if sampling was not used.

The underestimation of the lower-bound value

will result to an increase in false-alarms, which

will in turn incur high traffic.

To overcome this problem, an alternative

sampling method is therein proposed. U and L are

sampled separately. Initially, the elements of U are

stored in ascending order. In U ' the first M ele-

ments of this ordering are selected. Respectively,

L is sorted in descending order and the first M

elements in L ' are selected. The intuition is that

the selection of the smallest M values of U , helps

in increasing the number of occurrences of the

first case (i.e., when C i > U i '), since the smallest

the value of U i ' is, the more expected is to have a

C i ' larger than it. Accordingly, it is easy to prove

(Karydis, Nanopoulos, & Manolopoulos, 2005)



bytes 2 .

The plain representation requires 5 M bytes

(since it requires only one integer, i.e., 4 bytes,

to store the position of each selected element).

Thus, the proposed method is advantageous

when N <32 M , i.e., for sample larger than about

3% (experimentation showed that samples with

size 10% are the best choice).

In the following, existing algorithms for

searching in P2P networks that can be used in

a modified version within the framework are

presented.

The BFSS Algorithm

The simplest similarity searching algorithm is on

the basis of breadth-first-search over the nodes of

the P2P network. The adapted algorithm, which

uses the proposed sampling and representation

methods, is denoted as BFSS (breadth-first-search

with sampling). Each time, the current node n is

considered. A TTL (time-to-leave or Maxhop)

value denotes how many valid hops are remain-

ing for n , whereas T s is the user-defined similarity

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home