how can I do Principal Components Analysis with octave?

Discussion:

rino mailing

2004-05-11 08:04:19 UTC

I'd like to do Principal Components Analysis with octave

What are the command I ave to write?

How to plot the result?

Thank you in advance for the time you spend to answer me, Mario.

Ted Harding

2004-05-11 08:58:49 UTC

Permalink

On 11-May-04 rino mailing wrote:
> I'd like to do Principal Components Analysis with octave
>
> What are the command I ave to write?
>
> How to plot the result?

You find the principal components by first finding the eigenvalues
and eigenvectors of the covariance or correlation matrix (depending
on whether you want the results to be independent of the units in
which the variables are measured).

So let C be (say) the covariance matrix. Then

[V,D] = eig(C)

gives you the eigenvectors as columns of V, and the eigenvalues
as the diagonal elements of D, in increasing order of magnitude.

So, therefore, ifr X is the N by k matrix of data, C=cov(X),
[V,D]=eig(C), then X*V(:,k) gives you the first principal component,
X*V(:,k-1) the second, and so on.

I'm not sure what you mean by "plot the result". Some people want
to see a graph which shows the proportions of variance accounted
for by the successive principal components; for this you can plot
the diagonal elements diag(D) of D in reverse order, perhaps in the
form 100*diag(D)/sum(diag(D)). Or the amount of variance accounted
for by the first, first+second, first+second+third, etc; for this
you can plot the (reverse-order) cumsum of diag(D).

Or you may want to say plot the first principal component against
the second, so: PC1=X*V[:,k];PC2=X*V[:,k-1];plot(PC1,PC2)

This may be useful in exploring possible clustering; or you may
already have an index say Grp which identifies each row of X
as belonging to a group by Grp==1, Grp==2, Grp==3, ... , in which
case you can plot these subsets of (PC1,PC2) in different colours,
to see where the groups fall in the plot.

And so on ...

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <***@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 11-May-04 Time: 09:58:49
------------------------------ XFMail ------------------------------

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------

Henry F. Mollet

2004-05-12 01:52:58 UTC

Permalink

I believe you would need the programs doing the work. They are called
functions in Octave or Matlab, something like "PCA.m". I would be interested
myself so I checked using Google what MATLAB has:

Here is a list of the functions with a short description of each:
PRINCOMP - principal components from raw data matrix
PCACOV - pca from covariance matrix
PCARES - residuals from pca
BARTTEST - Bartlett's test for dimensionality.

Next I checked for the first two, namely "PRINCOMP" and "PCACOV" in
octave-forge but apparently neither is present. I guess we're out of luck
for the time being unless we have the capability to write the program.
Henry

on 5/11/04 1:04 AM, rino mailing at ***@virgilio.it wrote:

> I'd like to do Principal Components Analysis with octave
>
> What are the command I ave to write?
>
> How to plot the result?
>
>
>
> Thank you in advance for the time you spend to answer me, Mario.
>
>

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------

Stefan van der Walt

2004-05-12 08:23:58 UTC

Permalink

Maybe the attached klt.m will do the trick!

Regards
Stéfan

Henry F. Mollet wrote:
> I believe you would need the programs doing the work. They are called
> functions in Octave or Matlab, something like "PCA.m". I would be interested
> myself so I checked using Google what MATLAB has:
>
> Here is a list of the functions with a short description of each:
> PRINCOMP - principal components from raw data matrix
> PCACOV - pca from covariance matrix
> PCARES - residuals from pca
> BARTTEST - Bartlett's test for dimensionality.
>
> Next I checked for the first two, namely "PRINCOMP" and "PCACOV" in
> octave-forge but apparently neither is present. I guess we're out of luck
> for the time being unless we have the capability to write the program.
> Henry
>
>
> on 5/11/04 1:04 AM, rino mailing at ***@virgilio.it wrote:
>
>
>>I'd like to do Principal Components Analysis with octave
>>
>>What are the command I ave to write?
>>
>>How to plot the result?
>>
>>
>>
>>Thank you in advance for the time you spend to answer me, Mario.

Miquel Cabanas

2004-05-12 08:21:14 UTC

Permalink

the book "Applied Factor Analysis in the Natural Sciences" [1]
came with lots of Matlab code (appendix written by L. Marcus),
currently available at

http://life.bio.sunysb.edu/morph/soft-mult.html

look for REYBLUE and FACDOC in the table. Note though that
REYBLUE is an self-extracting exe-file, and that you will need
a DOS like environment to get the Matlab files.

Since this is Matlab v. 4.x code you should have almost no
problems to run it on Octave.

[1] R. Reyment and K.G. Jöreskog. Applied Factor Analysis
in the Natural Sciences. Cambridge University Press, 1993.
(I think it's still available, for instante at Amazon)

On Tue, May 11, 2004 at 06:52:58PM -0700, Henry F. Mollet wrote:
> I believe you would need the programs doing the work. They are called
> functions in Octave or Matlab, something like "PCA.m". I would be interested
> myself so I checked using Google what MATLAB has:
>
> on 5/11/04 1:04 AM, rino mailing at ***@virgilio.it wrote:
>
> > I'd like to do Principal Components Analysis with octave
> >

--
Miquel E Cabanas ------------------------------------------------------
SeRMN, Universitat Autonoma de Barcelona (***@uab.es)
------------------------------------------o-oo--ooo---ooo--oo-o--------

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------

Fredrik Lingvall

2004-05-12 08:49:49 UTC

Permalink

Try this:

function [pc,sv,n_sv] = pca(x)
% [pc,sv,n_sv] = pca(x)
%
% Input:
% x - Data stored column-vise .
%
% Output:
% pc - Principal components (eigenvectors of the covariance matrix).
% sv - Singular values.
% n_sv - Normalized singular values.

C = cov(x);
[U,D,pc] = svd(C);
sv = diag(D);
n_sv = 100*sv/sum(sv);

\Fredrik

>I believe you would need the programs doing the work. They are called
>functions in Octave or Matlab, something like "PCA.m". I would be interested
>myself so I checked using Google what MATLAB has:
>
>Here is a list of the functions with a short description of each:
>PRINCOMP - principal components from raw data matrix
>PCACOV - pca from covariance matrix
>PCARES - residuals from pca
>BARTTEST - Bartlett's test for dimensionality.
>
>Next I checked for the first two, namely "PRINCOMP" and "PCACOV" in
>octave-forge but apparently neither is present. I guess we're out of luck
>for the time being unless we have the capability to write the program.
>Henry
>
>
>on 5/11/04 1:04 AM, rino mailing at ***@virgilio.it wrote:
>
>
>
>>I'd like to do Principal Components Analysis with octave
>>
>>What are the command I ave to write?
>>
>>How to plot the result?
>>
>>
>>
>>Thank you in advance for the time you spend to answer me, Mario.
>>
>>
>>
>>

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------

Paul Kienzle

2004-05-12 11:03:45 UTC

Permalink

Up to a possible change in sign in some of the components, the
following, the following appears to be equivalent to princomp:

## Compute principal components of X
## [pc,z,w,Tsq] = princomp(X)
## pc the principal components
## z the transformed data
## w the eigenvalues of the covariance matrix
## Tsq Hotelling's T^2 statistic for the transformed data
function [pc,z,w,Tsq] = princomp(X)
C = cov(X);
[U,D,pc] = svd(C);
if nargout>1, z = center(X)*pc; end
if nargout>2, w = diag(D); end
if nargout>3, Tsq = sumsq(zscore(z),2); end

Do we want alternative PCA functions in octave-forge as well?
We could extend the princomp interface. Some things that
seem useful to me are giving it a dimension argument, and
choosing a 'good' one if dimension is 0. If nargout is zero
we could plot the data in the space of the first two principle
components, or a pareto chart of the eigenvalues. Other
suggestions?

Thanks,

Paul Kienzle
***@users.sf.net

On May 12, 2004, at 4:49 AM, Fredrik Lingvall wrote:

> Try this:
>
> function [pc,sv,n_sv] = pca(x)
> % [pc,sv,n_sv] = pca(x)
> %
> % Input:
> % x - Data stored column-vise .
> %
> % Output:
> % pc - Principal components (eigenvectors of the covariance
> matrix).
> % sv - Singular values.
> % n_sv - Normalized singular values.
>
> C = cov(x);
> [U,D,pc] = svd(C);
> sv = diag(D);
> n_sv = 100*sv/sum(sv);
>
> \Fredrik
>
>> I believe you would need the programs doing the work. They are called
>> functions in Octave or Matlab, something like "PCA.m". I would be
>> interested
>> myself so I checked using Google what MATLAB has:
>>
>> Here is a list of the functions with a short description of each:
>> PRINCOMP - principal components from raw data matrix
>> PCACOV - pca from covariance matrix
>> PCARES - residuals from pca
>> BARTTEST - Bartlett's test for dimensionality.
>>
>> Next I checked for the first two, namely "PRINCOMP" and "PCACOV" in
>> octave-forge but apparently neither is present. I guess we're out of
>> luck
>> for the time being unless we have the capability to write the program.
>> Henry
>>
>>
>> on 5/11/04 1:04 AM, rino mailing at ***@virgilio.it wrote:
>>
>>
>>> I'd like to do Principal Components Analysis with octave
>>>
>>> What are the command I ave to write?
>>>
>>> How to plot the result?
>>>
>>>
>>>
>>> Thank you in advance for the time you spend to answer me, Mario.
>>>
>>>
>>>
>
>
>
> -------------------------------------------------------------
> Octave is freely available under the terms of the GNU GPL.
>
> Octave's home on the web: http://www.octave.org
> How to fund new projects: http://www.octave.org/funding.html
> Subscription information: http://www.octave.org/archive.html
> -------------------------------------------------------------
>

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------

Vic Norton

2004-05-12 13:31:01 UTC

Permalink

Hi Mario,

I'll give you my slant on Principal Components Analysis. I prefer
singular value decomposition to the covariance-eigen-value route. I
understand the former better, and it is more efficient.

Start with an m x n data matrix X and an m x 1 vector of weights w. We
require w(i) > 0 for i = 1,..,m and sum(w) = 1. In the normal
application w(i) = 1/m for i = 1,..,m.

Then
E = sum(diag(w) * X)
is the 1 x n matrix of expected values of the columns of X,
V = (X - ones(m, 1) * E)' * diag(w) * (X - ones(m, 1) * E)
is the n x n covariance matrix corresponding of X
(V = ((m - 1)/m) * cov(X) in the normal case),
S = sqrt(diag(V))'
is the 1 x n matrix of standard deviations of the columns of X, and
C = diag(1 ./ S) * V * diag(1 ./ S)
is the correlation matrix. (C = corrcoef(X) in the normal case.)

Now let's geometrize these computations. First reassign
w = sqrt(w)
and put
X = diag(w) * X.
Then the expected values of the columns of the original X are
E = w' * X.
One can think of E as the perpendicular projection of the new X
onto w-axis (more or less) with
F = X - w * E
the complementary projection. Now
S = sqrt(sum(F .^ 2))
is the 1 x n matrix of standard deviations of the columns of the
original X, and
V = F' * F
is the covariance matrix above. To normalize the situation set
F1 = F * diag(1 ./ S).
Then
C = F1' * F1
is the correlation matrix above.

OK, let's get to principal component analysis. We work with F or
F1. It is not necessary to compute the covariance or correlation
matrices.

If the units of measurement are important, do
[U, Sig, W] = svd(F, 1)
otherwise do
[U1, Sig1, W1] = svd(F1, 1).
The singular values, the coefficients of the diagonal matrices Sig and
Sig1, measure the significance of the various factors:
Sig(1) >= Sig(2) >= ... >= 0
with sum(diag(Sig) .^ 2) = trace(V) = total variance
and
Sig1(1) >= Sig1(2) >= ... >= 0
with sum(diag(Sig1) .^ 2) = trace(C) = n.
These singular values are, in effect, principal standard deviations.
Their squares are the eigenvalues of the covariance and correlation
matrices.

(Note in passing. Put Y = Sig * W' and Y1 = Sig1 * W1'.
Then Y and Y1 are n x n and V = Y' * Y, C = Y1' * Y1.)

For two dimensional plots, plot the n columns of
Y([1, 2], :) = diag( [Sig(1), Sig(2)] ) * W(:, [1, 2])'
or
Y1([1, 2], :) = diag( [Sig1(1), Sig1(2)] ) * W1(:, [1, 2])',
point by point. These points represent your data. The horizontal, 1-axis
is the most significant here. It is the axis of maximum variance
(correlation?) in the data.

Regards,

Vic

At 9:58 AM +0100 5/11/04, Ted Harding wrote:
> On 11-May-04 rino mailing wrote:
> > I'd like to do Principal Components Analysis with octave
> >
> > What are the command I ave to write?
> >
> > How to plot the result?
>
> You find the principal components by first finding the eigenvalues
> and eigenvectors of the covariance or correlation matrix (depending
> on whether you want the results to be independent of the units in
> which the variables are measured).
>
> So let C be (say) the covariance matrix. Then
>
> [V,D] = eig(C)
>
> gives you the eigenvectors as columns of V, and the eigenvalues
> as the diagonal elements of D, in increasing order of magnitude.
>
> So, therefore, ifr X is the N by k matrix of data, C=cov(X),
> [V,D]=eig(C), then X*V(:,k) gives you the first principal component,
> X*V(:,k-1) the second, and so on.
>
> I'm not sure what you mean by "plot the result". Some people want
> to see a graph which shows the proportions of variance accounted
> for by the successive principal components; for this you can plot
> the diagonal elements diag(D) of D in reverse order, perhaps in the
> form 100*diag(D)/sum(diag(D)). Or the amount of variance accounted
> for by the first, first+second, first+second+third, etc; for this
> you can plot the (reverse-order) cumsum of diag(D).
>
> Or you may want to say plot the first principal component against
> the second, so: PC1=X*V[:,k];PC2=X*V[:,k-1];plot(PC1,PC2)
>
> This may be useful in exploring possible clustering; or you may
> already have an index say Grp which identifies each row of X
> as belonging to a group by Grp==1, Grp==2, Grp==3, ... , in which
> case you can plot these subsets of (PC1,PC2) in different colours,
> to see where the groups fall in the plot.
>
> And so on ...
>
> Hoping this helps,
> Ted.
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <***@nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 167 1972
> Date: 11-May-04 Time: 09:58:49

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------