Information Technology Reference
In-Depth Information
Ta b l e 1 . Mean expected regret of untuned, tuned and learned policies on Bernoulli and Gaussian
bandit problems. Best scores in each of these categories are shown in bold. Scores corresponding
to policies that are tested on the same horizon T than the horizon used for training/tuning are
shown in italics.
Policy
Training
Parameters
Bernoulli
Gaussian
Horizon
T=10
T=100
T=1000
T=10
T=100
T=1000
Untuned generic policies
UCB1
-
C =2
1.07
5.57
20.1
1.37
10.6
66.7
UCB1-T UNED
-
0.75
2.28
5.43
1.09
6.62
37.0
UCB1-N ORMAL
-
1.71
13.1
31.7
1.65
13.4
58.8
α =10 3
UCB2
-
0.97
3.13
7.26
1.28
7.90
40.1
UCB-V
-
c =1 =1
1.45
8.59
25.5
1.55
12.3
63.4
KL-UCB
-
c =0
0.76
2.47
6.61
1.14
7.66
43.8
KL-UCB
-
c =3
0.82
3.29
9.81
1.21
8.90
53.0
n -G REEDY
-
c =1 ,d =1
1.07
3.21
11.5
1.20
6.24
41.4
Tuned generic policies
C =0 . 170
0.74
2.05
4.85
1.05
6.05
32.1
T=10
UCB1
C =0 . 173
0.74
2.05
4.84
1.05
6.06
32.3
T=100
C =0 . 187
0.74
2.08
4.91
1.05
6.17
33.0
T=1000
α =0 . 0316
0.97
3.15
7.39
1.28
7.91
40.5
T=10
UCB2
α =0 . 000749
0.97
3.12
7.26
1.33
8.14
40.4
T=100
α =0 . 00398
0.97
3.13
7.25
1.28
7.89
40.0
T=1000
c =1 . 542 =0 . 0631
0.75
2.36
5.15
1.01
5.75
26.8
T=10
c =1 . 681 =0 . 0347
UCB-V
0.75
2.28
7.07
1.01
5.30
27.4
T=100
c =1 . 304 =0 . 0852
0.77
2.43
5.14
1.13
5.99
27.5
T=1000
c = 1 . 21
0.73
2.14
5.28
1.12
7.00
38.9
T=10
KL-UCB
c = 1 . 82
0.73
2.10
5.12
1.09
6.48
36.1
T=100
c = 1 . 84
0.73
2.10
5.12
1.08
6.34
35.4
T=1000
c =0 . 0499 ,d =1 . 505
0.79
3.86
32.5
1.01
7.31
67.6
T=10
n -G REEDY
c =1 . 096 ,d =1 . 349
0.95
3.19
14.8
1.12
6.38
46.6
T=100
c =0 . 845 ,d =0 . 738
1.23
3.48
9.93
1.32
6.28
37.7
T=1000
Learned numerical policies
...
0.72
2.29
14.0
0.97
5.94
49.7
T=10
P OWER -1
(16 parameters)
0.77
1.84
5.64
1.04
5.13
27.7
T=100
...
0.88
2.09
4.04
1.17
5.95
28.2
T=1000
...
0.72
2.37
15.7
0.97
6.16
55.5
T=10
P OWER -2
(81 parameters)
0.76
1.82
5.81
1.05
5.03
29.6
T=100
...
0.83
2.07
3.95
1.12
5.61
27.3
T=1000
Learned symbolic policies
t k ( r k
1 / 2)
0.72
2.37
14.7
0.96
5.14
30.4
T=10
F ORMULA -1
r k +1 / ( t k +1 / 2)
0.76
1.85
8.46
1.12
5.07
29.8
T=100
r k +3 / ( t k +2)
0.80
2.31
4.16
1.23
6.49
26.4
T=1000
|
r k 1 / ( t k + t ) |
0.72
2.88
22.8
1.02
7.15
66.2
T=10
F ORMULA -2
r k + min (1 /t k ,log (2))
0.78
1.92
6.83
1.17
5.22
29.1
T=100
1 /t k
1 / ( r k
2)
1.10
2.62
4.29
1.38
6.29
26.1
T=1000
Search WWH ::




Custom Search