PCA is for:
Determining meaningful differences between sets
Evaluating covariance between sets of data
Reducing data to its most important axes (principal components).
Allowing simplified reconstitution of data
Principal Component Analysis compares a set against its own mean to normalize all data between -1 and 1. Then it compares this set to another set to see whether they both vary (covary) with the same sign and proportion. If I vary +0.5 from the mean and you vary -0.3, then our covariance is -0.15, a small negative correlation. And so it continues.
If cov(A,B) is 0.8 and cov(B,C) is 0.01 and cov(A,C) is 0.1, I can focus on A and B and essentially ignore C.
In plotting x,y data the principal axes will align along the linear equation that best fits the set, and then orthogonal to that a line that defines the deviation (spread) of the data points.
This means when you have lots of training images, you will be treating each image as a data set, eg a column in your matrix, eg an added dimension of the matrix. So 700 images = 700 dimensions. And the you will be reducing all this data by finding the mean and covariance. This lets you match images that have similar covariance results!
I used this primarily to help me label important components of any Cymbella diatom, by treating it as a blob and finding its axes. Diatoms are symmetric across their axes so this is a great use of PCA. I based m code on the pen orientation example code.
I spent a long time learning about linear algebra, matrix multiplication, eigenvectors and eigenvalues. I read through Lindsey Smith’s tutorial. I particularly enjoyed http://betterexplained.com/articles/linear-algebra-guide/ and
http://www.ams.org/samplings/feature-column/fcarc-svd for excellent visual representations. The identity and transformation matrices are critical to PCA. These guides help you really picture how the transformation works.