Abstract
Diagnosing Alzheimer's disease through MRI neuroimaging biomarkers has been used as a complementary marker for traditional clinical markers to improve diagnostic accuracy and also help in developing new pharmacotherapeutic trials. It has been revealed that longitudinal analysis of the whole brain atrophy has the power of discriminating Alzheimer's disease and elderly normal controls. In this work, effect of involving intermediate atrophy rates and impact of using uncorrelated principal components of these features instead of original ones on discriminating normal controls and Alzheimer's disease subjects, is inspected. In fact, linear discriminative analysis of atrophy rates is used to classify subjects into Alzheimer's disease and controls. Leaveoneout crossvalidation has been adopted to evaluate the generalization rate of the classifier along with its memorization. Results show that incorporating uncorrelated version of intermediate features leads to the same memorization performance as the original ones but higher generalization rate. As a conclusion, it is revealed that in a longitudinal study, using intermediate MRI scans and transferring them to an uncorrelated feature space can improve diagnostic accuracy.
Keywords:
Alzheimer's disease; diagnostic; discriminate analysis; neuroimaging; whole brain atrophy; principal component analysis1. Introduction
Alzheimer's disease (AD) is known as the most prevalent type of dementia in elderly subjects which has been influenced about 26 million people worldwide [1,2] Disease onset starts with abnormal excessive agglomeration of amyloid β (Aβ) protein and then hyperphosphorylated tau in the brain [1]. This causes deterioration of the synopsis and axons in neurons. Gradually brain degeneration lapses memory and culminates in functional and lingual decline. These changes always intervene in the same order but they may overlap each other in various clinical disease stages [2]. These orders and overlaps are illustrated in Figure 1.
Figure 1. Various biomarkers of Alzheimer's Disease and the stage of disease they are affective. The first three biomarkers can be used for prognosis of Alzheimer's Disease prior to dementia diagnosis.
Clinical measures for diagnosing AD are traditionally based on two last biomarker and some standard measures such as Mini Mental Score Exam (MMSE), Clinical Dementia Rating (CDR), Functional Assessment Staging Scale (FAST), Global Deterioration Scale (GDS) or Alzheimer's disease Assessment Scale (ADAS) are used to diagnose people with AD clinically. It is obvious that these measures are useful just in the second and third stages of disease and cannot be used in first stage where there is no manifest behavioral or memory impairment [3,4]. Furthermore, these scores singly are not accurate enough and some complementary biomarkers are needed for accurate diagnosis of AD [4,5]. The need for monitoring disease progression in designing new therapeutic trials encourages researchers to find noninvasive accurate biomarkers of AD [6,7]. MR images due to their high resolution and noninvasive nature, are good candidates for realizing degeneration of brain structures and finding strong relationships between them and disease progression [6]. Various anatomical structures of brain such as Entorhinal Cortex [79], Hippocampus [10,11] and Cerebral Cortex [1214] influenced by AD and their atrophic characteristics such as volume, shape and thickness can be used as biomarkers of AD [6,12,15,16]. Concentrating on atrophic characteristics of anatomical structures is prone to some imperfection. That is, disease related atrophies don't necessarily follow the anatomical boundaries of structures and each part of the brain can be changed under the influence of disease.
The rate of whole brain volume change is almost constant in the third stage of disease and this makes it useful in monitoring the pharmacotherapeutic trials [12,1719]. Figure 2 shows the profile of structural changes in AD. It is depicted that amyloid markers change at early stages of disease, even decades before diagnosing AD. Besides, degeneration of anatomical structures starts somehow latter, around 10 years before clinically diagnosing AD, but still beneficial for AD prognosis.
Figure 2. Natural progression of cognitive and biological markers of Alzheimer disease: a theoretical model.
There are some methods for measuring brain atrophy in the literature but only three of them are validated. Boundary Shift Integral (BSI) [20,21], Structural Image Evaluation Using Normalization of Atrophy (SIENA) [22] and cross sectional counterpart of it (SIENAX) [18] are the most accurate and broadly accepted methods for evaluating atrophy rate of the brain. Research shows that SIENA has the same accuracy as BSI and so it is fair to choose any of the abovementioned method in measuring atrophy rate of whole brain in a twoyear longitudinal study. That is, the differences between two measures have no effect on the pathological discrimination power of the method.
To measure the whole brain atrophy rate, the pipeline conducted by Smith and et.al are used in this paper [18,2328]. First step in this pipeline is brain surface extraction which separates the brain from other nonbrain parts such as skull or scalp in both images of longitudinal study. To do so, a deformable tessellated mesh have been used which deforms under the control of local parameters and finally matches the brain of head [27]. Afterward, base images must be registered to follow up counterparts. In this step, it was necessary to avoid rescaling artifacts which could change the atrophy size. With this in mind, it has been assumed that the size of skull is constant; it is considered as normalization factor in scaling process. To escape unnecessary modifications of nonlinear registration which matches images as much as possible and eliminates the atrophic differences between them, the linear registration is preferred in this study [26].
Next step is to measure the differences between images. Thus, brain images have been segmented into their three major tissues  Gray Matter (GM), White Matter (WM) and Cerebrospinal Fluid (CSF) [29]. Boundary points of these tissues have been used to measure the difference between images. One 3 by 3 gradient operator was used to find the gradients in these points. In a peer to peer comparison of 3^{mm }intensity profile on these gradients, the shift distance that maximizes the correlation between these profiles have considered as difference measure. Normalized sum of these measures over all boundary points indicates the overall differences between brain volumes and is called Percentage of Brain Volume Change (PBVC) [22].
Magnetic resonance images (MRI) from Alzheimer's disease neuroimaging (ADNI) database are used in this study [30]. Percentage of brain volume change is evaluated between baseline and the 6th month and the 24th month follow up intervals pair wise. These 3 atrophy rates are used as features in discriminate analysis (DA). Because of high degree of correlation between the features, principal component analysis (PCA) is used to convert the feature space to an uncorrelated feature space and at the same time to reduce the size of space. Discriminative power of these features is compared with the original ones.
2. Materials and methods
2.1. Subjects
A total of 30 AD patients (46.7% female; mean age of 75 at the standard deviation of 7), and 30 agematched healthy normal controls (50% female; mean age of 77 at the standard deviation of 5) are selected from the ADNI public database http://www.loni.ucla.edu/ADNI/Data/ webcite. ADNI is a large fiveyear study launched in 2004 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and nonprofit organizations, as a $60 million publicprivate partnership. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessments acquired at multiple sites (as in a typical clinical trial), can replicate results from smaller single site studies measuring the progression of MCI and early AD. Determination of sensitive and definite markers of very early AD progression is destined to aid researchers and clinicians to monitor the effectiveness of new treatments, and diminish the time and cost of clinical trials. The Principal Investigator of this initiative is Michael W. Weiner, M.D., VA Medical Center and University of California, San Francisco.
All the AD and NC subjects in this study had successfully undergone MRI scanning, cognitive tests and clinical evaluation at baseline, 6^{th }months and 2^{nd }year follow up.
2.2. Statistical analysis
Some demographic parameters such as age, sex and years of education have remarkable impact on brain atrophic measures and to avoid their influence on the study, subjects of two groups must be matched regarding them. Difference in gender among the two groups is tested with the Chisquare test and matched (p = 0.796). Independent two sample student ttest is used to test intergroup differences in age and years of education. As there are no significant differences in age (p = 0.188) and years of education (p = 0.554) among the two groups, they were ignored in diagnosing AD in this study. Baseline MMSE and PBVC in all three time intervals of baseline to the 6^{th }month follow up (PbvcSc6), 6^{th }month to 24^{th }month follow up (Pbvc624) and baseline to 24^{th }month follow up (PbvcSc24) indicate significant differences between the two groups (Table 1).
Table 1. Demographic and clinical variables by diagnostic group
These results approve that the two groups are disparate based on longitudinal volume changes, but it does not specify the way of classifying one individual subject into one of these groups based on above features.
DA is a statistical technique used to differentiate groups when the underlying features are quantitative and normally distributed [31]. It is an appropriate method for classifying patterns of subjects into two desired separated groups, AD and NC.
2.3. Discriminant analysis
The aim of DA is to analyze group separation power for a set of normally distributed features or pattern of features. Test of normality for all three atrophic measures imply their normal distribution through both groups (Table 2).
Table 2. Normality test of atrophy rates using kolmogorovsmirnov method
The simplest and first way to this is using total means of features as threshold values. Patterns with feature values above it will be assigned to one group and the ones bellow it to the other.
Referring to the total means of Table 1, results of classification will be as shown in Table 3. It is obvious that longterm atrophy rates yield higher accuracy.
Table 3. Classification based on total mean thresholding
These values may not be the optimal threshold values and for comprehensive evaluation, Receiver Operating Characteristic (ROC) curve analysis is carried out. ROC curve plots for all of the three features and associated parameters are shown in Figure 3.
Figure 3. Receiver Operating Characteristic curve plot for (a) Baseline to 6^{th }month atrophy rate, (b) 6^{th }month to 24^{nd }month atrophy rate, (c) Baseline to 24^{nd }month atrophy rate. It is conspicuous that using long term atrophy rates for diagnosis, leads to higher accuracy.
The highest diagnostic accuracy of 90% is achieved by using PbvcSc24 and a specific threshold value. To evaluate generalization capacity of this feature, leaveoneoutcrossvalidation is conducted. Finding discloses lower generalization accuracy besides the memorization (Table 4).
Table 4. cross validation results
After that, two other features are included in DA to see whether the accuracy is enhanced or not. A key assumption of DA is that the features should not be highly correlated, but these three features are highly correlated (Table 5).
Table 5. correlation coefficients
It is clear that PbvcSc24 has high correlation with PbvcSc6 and Pbvc624 and this violates the terms of analysis. To overcome this we use principal component analysis (PCM) to convert them to uncorrelated features. There are two main steps in conducting PCA:
• Step 1: Assessment of data suitability
Sample size or factorability of data, and the strength of the relationship among the features are two main issues to consider in determining whether a particular data set is suitable for PCA or not. A sample size over feature space dimension ratio of 10/1 has been recommended [32]. To put it in other words, at least 10 samples for each feature are needed to be PC analyzed. This criterion is passed in the study. Moreover, two statistical measures are also available for analyzing suitability of the sample size. Bartlett's test of sphericity [33], and the KaiserMeyerOlkin (KMO) measure of sampling adequacy [34]. The Bartlett's test of sphericity should be significant (p < 0.05) and the KMO index which ranges from 0 to 1, should be greater than 0.6 for the PCA to be considered appropriate. These two measures for our dataset are shown in Table 6.
Table 6. KMO and Bartlett's Test
Factorability of data samples are also confirmed according to these measures. In order for feature relationship to be strong, correlation between features should be at least 0.3 which is at this rate in our case (Table 5).
• Step 2: Feature extraction
In this step the number of features involved in discriminating groups, should be specified. This involves balancing two contradicting needs which are the need to find a simple solution with as few factors as possible and the need to explain as much of the variance in the original data set as possible. There are a number of techniques that can be used to specify the number of features to be kept. One of them is Kaiser's criterion [35], according to which, only features with an eigenvalue of 1.0 or more are retained. The eigenvalue of a feature represents the amount of the total variance explained by that feature. Extracting features by this method leads to selecting only one feature (Table 7).
Table 7. Parallel analysis
The next test is known as Scree test [36]. It plots each of the eigenvalues and inspects the plot to find a point at which the shape of the curve changes direction toward horizontal or an elbow. Keeping all factors above the elbow is recommended, as these features contribute the most to the explanation of the variance in the data set. In the case of our study, two of the features settle above the elbow and can be kept (Figure 4).
Figure 4. Breaking happens in feature 2.
Other method in determining number of features is parallel analysis [37]. Parallel analysis involves comparing the value of the eigenvalues with those obtained from a randomly generated data set of the same size. Only those eigenvalues that exceed the corresponding values from the random data set are kept. According to this analysis, only one of features can be kept (Table 8).
Table 8. Total Variance Explained
Regarding to the three abovementioned methods, only one of the features must be selected for discriminating subjects. Referring to the Table 7, it carries 79.371% of total variance among data which seems not satisfactory. Indeed, PCA is used as a data exploration technique, so the interpretation and the way we use it is up to our judgment, rather than any hard and fast statistical rules. Here in this article, it is supposed that the algorithm is interested only in components that have an eigenvalue of 0.6 or more. By extracting two uncorrelated features, with which 99.863% of total variance among data will be carried, which is highly satisfactory.
To investigate the contribution degree of initial features in newly extracted ones, refer to Table 9. It can be seen from this table that most of the features load quite strongly (above 0.4) on them (except PbvcSc6 on PC2).
Table 9. component matrix
As expected, the new extracted features are highly uncorrelated (Table 10).
Table 10. within group CORRELATION MATRIX
DA can be carried on by these two newly extracted uncorrelated features.
Calculated unstandardized canonical discriminant function is:
With ds as discriminant score, Table 11 shows the mean of ds for two groups of subjects which are conspicuously far apart each other.
Table 11. discriminant function at group Centroid
To measures the association between the ds and the groups, Canonical correlation should be considered (Table 12). A high value (near 1) shows that the function discriminates quite well.
Table 12. Eigenvalues
With regard to canonical correlation of 0.671 in this study, discrimination power of these extracted features is conceived as moderate. Wilk's Lambda shows the proportion of the total variance (55%) in the ds not explained by differences among groups (Table 13). A small Lambda value (near 0) indicates that the group's mean ds differs. The Sig (p < 0.001) is for the Chisquare test which indicates there is a highly significant difference between the groups' centroids.
Table 13. Wilks' Lambda
To investigate the impact of each extracted feature on the discriminant function, correlation (in order of importance) of each feature with the ds is calculated (Table 14). It is revealed that PC1 has highest impact on discrimination process.
Table 14. Structure Matrix
3. Results and discussion
As the final stage in DA, the classification results are summarized in Table 15.
Table 15. classification results
Results show that there is not any improvement in the accuracy of the model with two extracted features (PC1PC2) compared to PBVCsc24 alone (88.33%). To indicate that the discriminatory power of the classification is statistically better than done by chance (50%), Press's Q statistic is used to compare with the critical value (6.63) from the Chisquare distribution.
where N is total sample size, n is the number of correctly classified patterns and k is the number of different groups. It is evaluated to 35.27 which is greater than the critical value of 6.63:
So, the results of achieved classifier are better than classified by chance. To evaluate the generalization capacity of this classifier, we involved leaveoneout cross validation method. Results are shown in Table 16.
Table 16. cross validation results
Compared to the generalization results of initially selected features in Table 4, it can be seen that the accuracy of the diagnosis using two extracted uncorrelated features (PC1PC2) improves, compared to PBVCsc24 alone for about 3.33%. It is revealed in Table. 17.
4. Conclusion
Findings of the study disclose that in longitudinal analysis of brain atrophy rate for diagnosing AD subjects, incorporating some intermediate (between baseline and follow up) MRI scans and using their corresponding atrophy rates in uncorrelated form or principal components of them, can improve the accuracy of diagnosis specially from generalization aspect.
In spite of this improvement, linear classifiers cannot discriminate subjects with the highest accuracy expected in the ROC curve. Consequently, nonlinear classifiers such as kernel support vector machine (SVM) must be invoked to achieve a higher accuracy of diagnosis. This is mainly because of nonlinear nature of atrophy rate between the subjects.
Appendix
Cross validation
In kfold crossvalidation, the initial data set is randomly partitioned into k nonoverlapping subsets or "folds" (D_{1}, D_{2}, ... , D_{k) }each of which with approximately equal size. Training and testing is performed k times. In iteration i, subset D_{i }is reserved as test set, and the remaining subsets are collectively used to train the model. To put it simple, in the first iteration, subsets D_{2}, ... , D_{k }are used as the training set in order to obtain a first model, which is tested on D1; the second iteration is trained on subsets D_{1}, D_{3}, ..., D_{k }and tested on D_{2}, and so on. For classification, the accuracy estimation is the overall number of correct classifications from the k iterations, divided by the total number of tuples in the initial data.
Leaveoneout is a special case of kfold crossvalidation where k is set to the number of initial tuples. That is, only one sample is left out at a time for the test set.
Principal Component Analysis (PCA)
It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences [38]. The other main advantage of PCA is that once you have found these patterns in the data, you can compress the data by reducing the number of dimension, without much loss of information. This technique is used in feature extraction to reduce feature space dimension and make features more discriminative.
PCA involves the eigenvalue decomposition of data covariance matrix to generate features that are optimally uncorrelated
Where P is the original pattern of features and I is the pattern of uncorrelated features. A is the eigenvalue of covariance matrix.
References

Suda S, Ueda M, Sakurazawa M, Nishiyama Y, Komaba Y, Katsura KI, et al.: Clinical and neuroradiological progression in diffuse neurofibrillary tangles with calcification.
Journal of Clinical Neuroscience 2009, 16(8):11124. PubMed Abstract  Publisher Full Text

Frisoni G, Fox N, Jack C, Scheltens P, Thompson P: The clinical use of structural MRI in Alzheimer disease.
Nature Reviews Neurology 2010, 6(2):6777. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ridha B, Anderson V, Barnes J, Boyes R, Price S, Rossor M, et al.: Volumetric MRI and cognitive measures in Alzheimer disease.
Journal of neurology 2008, 255(4):56774. PubMed Abstract  Publisher Full Text

Fox N, Crum W, Scahill R, Stevens J, Janssen J, Rossor M: Imaging of onset and progression of Alzheimer's disease with voxelcompression mapping of serial magnetic resonance images.
The Lancet 2001, 358(9277):2015. Publisher Full Text

Hua X, Lee S, Yanovsky I, Leow AD, Chou YY, Ho AJ, et al.: Optimizing power to track brain degeneration in Alzheimer's disease and mild cognitive impairment with tensorbased morphometry: An ADNI study of 515 subjects.
Neuroimage 2009, 48(4):66881. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wang L, Miller JP, Gado MH, McKeel DW, Rothermich M, Miller MI, et al.: Abnormalities of hippocampal surface structure in very mild dementia of the Alzheimer type.
NeuroImage 2006, 30(1):5260. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Liu Y, Paajanen T, Zhang Y, Westman E, Wahlund LO, Simmons A, et al.: Combination analysis of neuropsychological tests and structural MRI measures in differentiating AD, MCI and control groupsThe AddNeuroMed study.
Neurobiology of Aging 2009, in press.
Corrected Proof

Chetelat G, Desgranges B, Landeau B, Mezenge F, Poline J, De la Sayette V, et al.: Direct voxelbased comparison between grey matter hypometabolism and atrophy in Alzheimer's disease. Brain; 2007.

Di Paola M, Macaluso E, Carlesimo G, Tomaiuolo F, Worsley K, Fadda L, et al.: Episodic memory impairment in patients with Alzheimer's disease is correlated with entorhinal cortex atrophy.
Journal of neurology 2007, 254(6):77481. PubMed Abstract  Publisher Full Text

Morra J, Tu Z, Apostolova L, Green A, Avedissian C, Madsen S, et al.: Validation of a fully automated 3D hippocampal segmentation method using subjects with Alzheimer's disease mild cognitive impairment, and elderly controls.
NeuroImage 2008, 43(1):5968. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Apostolova LG, Mosconi L, Thompson PM, Green AE, Hwang KS, Ramirez A, et al.: Subregional hippocampal atrophy predicts Alzheimer's dementia in the cognitively normal.
Neurobiology of Aging 2010, 31(7):107788. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Plant C, Teipel SJ, Oswald A, Böhm C, Meindl T, MouraoMiranda J, et al.: Automated detection of brain atrophy patterns based on MRI for the prediction of Alzheimer's disease.
NeuroImage 2010, 50(1):16274. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fan Y, Batmanghelich N, Clark CM, Davatzikos C: Spatial patterns of brain atrophy in MCI patients, identified via highdimensional pattern classification, predict subsequent cognitive decline.
NeuroImage 2008, 39(4):173143. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Vemuri P, Gunter J, Senjem M, Whitwell J, Kantarci K, Knopman D, et al.: Alzheimer's disease diagnosis in individual subjects using structural MR images: validation studies.
NeuroImage 2008, 39(3):118697. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Teipel SJ, Born C, Ewers M, Bokde ALW, Reiser MF, Möller HJ, et al.: Multivariate deformationbased analysis of brain atrophy to predict Alzheimer's disease in mild cognitive impairment.
NeuroImage 2007, 38(1):1324. PubMed Abstract  Publisher Full Text

Teipel SJ, Ewers M, Wolf S, Jessen F, Kölsch H, Arlt S, et al.: Multicentre variability of MRIbased medial temporal lobe volumetry in Alzheimer's disease.
Psychiatry Research: Neuroimaging 2010, 182(3):24450. PubMed Abstract  Publisher Full Text

Sluimer JD, Bouwman FH, Vrenken H, Blankenstein MA, Barkhof F, van der Flier WM, et al.: Wholebrain atrophy rate and CSF biomarker levels in MCI and AD: A longitudinal study.
Neurobiology of Aging 2010, 31(5):75864. PubMed Abstract  Publisher Full Text

Smith SM, Zhang Y, Jenkinson M, Chen J, Matthews PM, Federico A, et al.: Accurate, Robust, and Automated Longitudinal and CrossSectional Brain Change Analysis.
Neuroimage 2002, 17(1):47989. PubMed Abstract  Publisher Full Text

Boundy KL, Barnden LR, Katsifis AG, Rowe CC: Reduced posterior cingulate binding of I123 iododexetimide to muscarinic receptors in mild Alzheimer's disease.
Journal of Clinical Neuroscience 2005, 12(4):4215. PubMed Abstract  Publisher Full Text

Freeborough PA, Woods RP, Fox NC: Accurate Registration of Serial 3D MR Brain Images and Its Application to Visualizing Change in Neurodegenerative Disorders.
Journal of computer assisted tomography 1996, 20(6):101222. PubMed Abstract  Publisher Full Text

Fox N, Freeborough P: Brain atrophy progression measured from registered serial MRI: validation and application to Alzheimer's disease.
Journal of Magnetic Resonance Imaging 1997, 7(6):106975. PubMed Abstract  Publisher Full Text

Smith S, De Stefano N, Jenkinson M, Matthews P: SIENA  Normalised accurate measurement of longitudinal brain change.

Smith SM, De Stefano N, Jenkinson M, Matthews PM: Normalized Accurate Measurement of Longitudinal Brain Change.
Journal of computer assisted tomography 2001, 25(3):46675. PubMed Abstract  Publisher Full Text

Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TEJ, JohansenBerg H, et al.: Advances in functional and structural MR image analysis and implementation as FSL.
Neuroimage 2004, 23(Supplement 1):S208S19. PubMed Abstract  Publisher Full Text

Jenkinson M, Bannister P, Brady M, Smith S: Improved Optimization for the Robust and Accurate Linear Registration and Motion Correction of Brain Images.
Neuroimage 2002, 17(2):82541. PubMed Abstract  Publisher Full Text

Jenkinson M, Smith S: A global optimisation method for robust affine registration of brain images.
Medical Image Analysis 2001, 5(2):14356. PubMed Abstract  Publisher Full Text

Smith S: Fast robust automated brain extraction.
Human Brain Mapping 2002, 17(3):14355. PubMed Abstract  Publisher Full Text

Zhang Y, Brady M, Smith S: Segmentation of brain MR images through a hidden Markov random field model and the expectationmaximization algorithm.
IEEE transactions on Medical Imaging 2001, 20(1):4557. PubMed Abstract  Publisher Full Text

Zhang Y, Brady M, Smith S: Segmentation of brain MR images through a hidden Markov random field model and the expectationmaximization algorithm.

Jack CR, Bernstein MA, Fox NC, Thompson P, Alexander G, Harvey D, et al.: The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods.
Journal of Magnetic Resonance Imaging 2008, 27(4):68591. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Han J, Kamber M: Data mining: concepts and techniques. Morgan Kaufmann; 2006.

Osborne J, Costello A: Sample size and subject to item ratio in principal components analysis.
Practical Assessment, Research & Evaluation 2004, 9(11):8. PubMed Abstract  Publisher Full Text

Gleser L: A note on the sphericity test.
The Annals of Mathematical Statistics 1966, 37(2):4647. Publisher Full Text

Kaiser H: A second generation little jiffy.
Psychometrika 1970, 35(4):40115. Publisher Full Text

Mykola P, editor: PCAbased Feature Transformation for Classification.

Guo Q, Wu W, Massart D, Boucon C, De Jong S: Feature selection in principal component analysis of analytical data.
Chemometrics and Intelligent Laboratory Systems 2002, 61(12):12332. Publisher Full Text

Jain A, Zongker D: Feature selection: Evaluation, application, and small sample performance.
Pattern Analysis and Machine Intelligence, IEEE Transactions on 2002, 19(2):1538.

Jolliffe I: Principal component analysis. Springer verlag; 2002.