Database Reference
In-Depth Information
Here
p xy is the confidence from x to y , i.e., it corresponds to our transition
probability p xy but without attention of the sequential order. This yields
e
ðÞ¼
q
cos x
;
e
p xy e
p yx
,
and the cosine measure can be interpreted as nonsequential counterpart to the
transition probabilities. In other words, the transition probabilities in both directions
between p and q are multiplied.
For factorization, we later will need the following relation. In order to calcu-
late the cosine measure similaritie s b etween all n p products of our transaction
matrix A , we introduce the matrix A by normalizing A along all n s columns a i .
Thus, A ¼ a 1 ... a n s
and
...
a 1
akk
a n s
a n kk
A ¼
:
n p xn p
Now the similarity matrix S
between all products can be simply
∈ R
expressed as
S ¼ A A T
:
8.3 PCA-Based Collaborative Filtering
8.3.1 The Problem and Its Statistical Rationale
In what follows, we shall introduce the factorization problem underlying
PCA-based CF along with a rather intuitive geometric rationale for the procedure.
Subsequently, we shall provide a statistical interpretation of the approach. The
latter is rather technical and may safely be skipped by a less mathematically
inclined reader.
Before plunging into the matter, we need to stipulate some basic mathematical
concepts. We assume that the reader brings along basic knowledge of linear algebra
at the level of an undergraduate introductory class.
The fundamental notion is that of a linear submanifold . Informally, a linear
submanifold of R
n p
is a shifted subspace. Specifically, it is a set
b þ x x
,
M
:¼ b þ X ¼
X
n p of dimension d . Given a basis x 1 , x 2 ,
where
denotes a subspace of
...
, x d of
Χ
R
n p
V , a vector x
lies in M if and only if
∈ R
x ¼ b þ y 1 x 1 þ ... þ y d x d
Search WWH ::




Custom Search