Machine Learning - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

>ε

E f k (Z)

∃

∈{

f 1 ,...,f N }:

f k (Z i )

−

i =

>ε

E f k (Z)

N exp −

2 nε 2 .

≤

f k (Z i )

−

≤

Equivalently, we have

∀

0 <δ< 1, with probability at least 1

− δ ,

log 2 δ

2 n

≤ R(g)

∀

∈{

g 1 ,...,g N }

,R(g)

22.3.2.3 Uniform Bounds in Infinite Case

While the bound introduced in the previous subsection can successfully handle the

finite function class, when the number of functions approaches infinity, it becomes

unclear whether the bound can converge. To better bound the difference between

empirical and expected risks in the infinite case, one needs to adopt some new con-

cepts, such as the VC dimension.

When the function class is uncountable (when it is countable, a trivial extension

of the results presented in the previous subsection can be directly applied and there

is no need to introduce the VC dimension), we look at the function class projected

on the sample. Specifically, given a sample z 1 ,...,z n , we consider

f(z 1 ),...,f(z n )

F z 1 ,...,z n =

∈ F

The size of this set is always finite, since function f can only take binary values. An

intuitive explanation on the size of the set is the number of possible ways in which

the data z 1 ,...,z n can be classified. Based on the set, we can define the concept of

a growth function as follows.

Definition 22.1 The growth function is the maximum number of ways into which

n data samples can be classified by the function class:

(n) =

z 1 ,...,z n | F z 1 ,...,z n | .

sup

It has been proven that if we define the growth function for

in the same way,

we will have S F (n)

S G (n) .

Given the concept of the growth function, the following result has been obtained:

∀

−

Theorem 22.2

0 <δ< 1, with probability at least 1

δ ,

log δ

2 log S G ( 2 n) +

≤ R(g)

∀

∈{

g 1 ,...,g N }

,R g)

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home