Biomedical Engineering Reference
In-Depth Information
3. Determine the variance accounted for by each component.
4. Visualize the components.
An excellent review on the mathematical details on PCA is Rey-
ment and Joreskog (1996). The method has advantages over
other linear transformations in that it does not assume basis vec-
tors and is not iterative; rather, its basis vectors are derived directly
from the data set. Other linear transforms may be applied and
have other advantages.
PCA is based on singular value decomposition, which can be
run in MATLAB using the command svd:
randData = rand(49, 10); % random peri-event
matrix: 49 neurons 10 bins
[comps, s] = svd(randData', 0) % SVD of the peri-event
matrix
% plot variance accounted for by components
figure; plot(l ./ sum(l)); xlabel('Components');
figure; hold on; plot(comps(:, 1:3));
xlabel('Time(bins)');
l = diag(s).ˆ2/(size(randData,2)-1); % percent
variance of components
scores = randData * comps; % determine scores on each
components
figure; imagesc(scores); xlabel('Neurons'); colorbar; %
visualize scores
Note the above example is based on randomly generated data sim-
ulating a perievent perievent matrix of 49 neurons over 10 bins.
The svd command outputs the components. A key portion of
PCA is that the results are easily interpretable in terms of variance
when the variance is uniform across the data; for these reasons,
it is ideal to work with data normalized to unit variance (i.e., Z
scores, where one unit equals one unit of standard deviation; not
used above for simplicity).
A scree plot of the variance represented by each component
(the variable 'l') above is instructive in guiding further analysis.
Each number in this variable corresponds to a component; the
raw value indicates how many neurons have variance explained by
a particular component. Components that are interesting must
account for significant variance. One rule of thumb is to extrap-
olate a line based on lower components; components that fall
above this line are of interest; components near this line should
be ignored. If desired, more stringent statistical criteria can be
used for this determination. The larger the data, the more com-
ponents may be of interest; however, rarely are more than 5-7
components of further interest. In random data, there is usually
only one component (a flat line) that meets these criteria.
One can plot the components with the hope of interpret-
ing them. Using this approach, we identified common response
Search WWH ::




Custom Search