Decomposition in Transition II: Adaptive Tensor Factorization - Realtime Data Mining

Database Reference

In-Depth Information

9.2.3 Problems of Tensor Factorizations

There exist many other tensor decomposition methods. However, the main problem is

that for most of them, unlike as for the Tucker decomposition, no efficient standard

algorithms to calculate the decomposition exist. So these algorithms need to be

developed.

To this end, we shall denote the sought-after decomposition of a given tensor

A by A Θ with

denoting the sought-after tensor components, e.g., the factors

U of the CP-decomposition. Thus, the task consists in determining an optimizer

Θ

of

min

Θ

fA

ð

;

A Θ

Þ ,

mostly in the form

min

Θ

k

A A Θ

k F :

Iterative procedures appear suitable for the solution of this optimization, since

they are optimal with respect to memory usage with regard to extremely sparsely

populated tensors. Their convergence rates, however, are critical.

In current literature, endless varieties of gradient descent methods (stochastic,

partitioned, prioritized, etc.) are often proposed. Mostly, a case for them is their easy

implementation. These procedures are then praised as “rapid” and “robust”; estimates

of their convergence rates are generously forgone. Of course, the exact opposite of

the promised holds true: gradient descent procedures heavily depend on the problem

and parameters and often converge rather slowly. Granted, they may now and then be

“fine-tuned” to the solution of a particular high-dimensional problem instance, but

they fail at solving other, even tiny, problems with the same parameter assignment.

All in all, there is no satisfactory guaranteed upper bound on the number of iterations

(and thus on the computational complexity) for any general class of problem

instances. Thus, gradient descent methods are suitable for certain experiments, but

not for practical deployment.

To ensure the necessary convergence rate, more sophisticated iteration procedures

are required, e.g., projection methods. This may be seen very clearly in the field of

matrices, where Krylov-subspace procedures are employed predominantly. For sym-

metric matrices, this mainly comes to the conjugate gradient (CG) method, mostly in

connection with a preconditioner (PCG), which may arguably be considered as the

default procedure of numerical analysis. For nonsymmetric matrices, especially for

the SVD, one mostly resorts to the Lanczos method, which has been presented in

Sect. 8.4.1 . Devising robust Lanczos procedures, though, is difficult and cannot be

immediately carried over to other formulations, as it requires orthogonality of the

bases. Yet there are orthogonalized approaches to the higher-order case but mostly,

again, for the Tucker case. An example is the High-Order Orthogonal Iteration

[KB09].

Realtime Data Mining

Search WWH ::

Custom Search

Home