Is Online PCA Information-Preserving?

Project Type: Student Project, Master/Diploma Thesis
Student: Simon Wasserfall

Principle Components Analysis (PCA) is an invertible rotation. That, at least, is the case if the covariance matrix of the data vector is known: Given the statistics, the eigenvalue decomposition yields the rotation matrix, and the data vector can thus be rotated into its eigenspace.

More typically, the covariance matrix is not known, but needs to be estimated from multiple independent realizations of the data vector. From this so-called sample covariance matrix, a rotation matrix can be derived, which can then be used to rotate the matrix of data vectors into its eigenspace. PCA performed this way is a non-linear operation, since the rotation matrix depends on the data that is rotated. Moreover, if we treat this PCA as a black box, then information is lost: The rotated vectors have lost all information about the rotation. We made this more clear in one of our previous works.

But what happens if we look at online PCA? In other words, we observe independent realizations of a data vector, but we learn the covariance structure over time. PCA is performed in each time step, based on the current estimate of the covariance matrix. Our preliminary experiments suggest that online PCA can preserve information, since the information about the rotation is stored in the "transients": The first few data vectors will be rotated differently than the later ones. Surprisingly, online PCA can preserve information only if the expected value is unknown. If the expected value is known, information is lost -- but it is presently unclear how much.

We believe that a similar situation occurs during whitening stochastic processes: If the power spectral density of the process is known, then the whitening filter is determined and this filter does not destroy information. If the power spectral density is not known, but estimated from multiple realizations of the process (e.g., from multiple sequences of given length), then we expect information to be lost. Finally, if we train a whitening filter online, e.g., using the LMS algorithm, then we again may have an information-preserving system that stores information about the power spectral density in the transients.

Your task is to critically evaluate these claims: Is online PCA information-preserving? If not, how much information do we lose? Is there a similar situation in whitening filters?

To solve these tasks, you should have at least some experience in linear algebra. Some knowledge in adaptive filtering, random processes, and information theory may also be helpful.