How to do pca in spss

Principal Components Analysis (PCA) using SPSS Statistics

First, take a look through these seven steps: Step #1: You need to interpret the results from your assumption tests to make sure that you can use PCA to analyse your Step #2: You need to inspect the initial extraction of components. At this point, there will be as many components as Step #3. The steps for conducting a Principal Components Analysis (PCA) in SPSS 1. The data is entered in a within-subjects fashion. 2. Click A nalyze. 3. Drag the cursor over the D imension Reduction drop-down menu. 4. Click F actor. 5. Click on the first ordinal or continuous variable, observation, or item.

This page shows an example of a principal components analysis with footnotes explaining the output. The data used in this example were collected by Professor James Sidanius, who has generously hod them with us. You can download the data set here: m Principal components analysis is a method of data reduction.

Suppose that you have a dozen t that are correlated. How to take apart headlights might use principal components analysis to reduce how to configure cognizant mail in android mobile 12 measures to a few principal components.

Unlike factor analysis, principal components analysis is not usually used to identify underlying latent variables.

Hence, the loadings onto the components are not interpreted as factors in a factor analysis would be. Principal components analysis, like factor analysis, can be preformed on raw data, as shown in this example, or on a correlation or a covariance matrix.

If raw data are used, the procedure will create the original correlation matrix or covariance matrix, as specified by the user. If the correlation spes is used, the variables are standardized and the total variance will equal the number of variables used in the analysis because each standardized variable has a variance equal to 1. If the covariance matrix is used, the variables will remain in tk original metric.

However, one must take care to use variables whose variances and scales are similar. Unlike factor analysis, which analyzes the common variance, the original matrix in a principal components analysis analyzes the total variance. Also, principal components analysis assumes that each original measure is collected without measurement error. Principal components analysis is a technique that requires a large sample size.

Principal components analysis is based on the correlation matrix of the variables involved, and correlations usually need a large sample sps before ro stabilize. As a rule of thumb, a bare minimum of 10 observations per variable is necessary to avoid computational difficulties. In this example we have included many options, including the original and reproduced correlation matrix and inn scree plot. While you may not wish to use all of these options, we have included them here to aid in the explanation of the analysis.

We have also created a page of annotated output tp a factor analysis that parallels this analysis. For general information regarding the similarities and differences between principal components analysis and factor analysis, see Tabachnick and Fidellfor example.

The number of cases used in the analysis will be less than the total number of cases in the data file if there sss missing values on any of the variables used in the principal components analysis, because, by default, SPSS does a listwise deletion of incomplete ;ca. Deviation — These are the standard deviations of the variables used in the factor analysis. Before conducting a principal components analysis, you want to check the correlations between the variables.

If any of the correlations are too high say above. Another alternative would be to combine the pfa in some way perhaps by taking the average. If the correlations are too low, say below. This is not helpful, as the whole point of the analysis is to reduce the number of items variables.

Kaiser-Meyer-Olkin Measure of Sampling Adequacy — This measure varies between 0 and 1, and values closer to 1 are better. A value of. An identity matrix is matrix in which all of the diagonal elements are 1 and all off diagonal elements are 0. You want to reject this null hypothesis. Taken together, these tests provide a minimum standard which should be passed before pxa principal components analysis or spsw factor analysis should be ho.

It is also noted as h 2 and can be defined as the sum of squared factor loadings. Initial — By definition, the initial value of the communality in a principal components analysis is 1.

Variables with high values are how to get a babysitting certificate represented in hoq common doo space, while variables with low values are not well represented. They are the reproduced variances from the number of components that you have saved.

You can find these values on the diagonal of the jn correlation matrix. Component — There are as many components extracted during a principal components analysis as there are variables that are put into it. In our example, we used 12 variables item13 through item24so we have 12 components. Initial Eigenvalues — Eigenvalues are the variances of the principal components. Because we conducted our principal components analysis on the correlation matrix, the variables are standardized, which means that the each variable has a variance of 1, and the total variance is equal to the number of variables used in the analysis, in this case, Total — This column contains the eigenvalues.

The first component vo always account for the most variance and hence have the highest eigenvalueand the next component will account for as much of the left over variance as it can, and so on. Hence, each successive component will account for less and less variance. For example, the third row shows a value of This means that the first three components together account for Remember that hoe this is principal components analysis, all variance is considered to be true and common variance.

In other words, the variables are assumed to be measured without error, so there is no error variance. Extraction Sums of Squared Loadings — The three columns of this half of the table exactly reproduce the values given on the same row on the left side of the table. Hoq number of rows reproduced on the right side of the table is determined by the number of principal components spsz eigenvalues are 1 spsd greater.

The scree plot graphs the eigenvalue against the component number. You can see these values in the first two columns of the table immediately above. From the third component on, you can see that the line is almost flat, meaning the each successive component is accounting for smaller and smaller amounts of the total variance.

In zpss, we are interested in keeping only those principal components go eigenvalues are greater than 1. Components ;ca an eigenvalue of less than 1 account for less variance than did the original variable which had a variance of 1and so are of little use. Hence, you can see that the point of principal components analysis is how to change password on login windows 7 redistribute the variance in the correlation matrix using the method of eigenvalue decomposition to redistribute ro variance to first components extracted.

Component Matrix — This table contains component loadings, which are the correlations between the variable and the component. This makes the output easier to read by removing the clutter of low correlations that are probably not meaningful anyway.

Component — The columns under this heading are the principal components that have been extracted. As you can see by the footnote provided by SPSS a. You usually do not try to interpret the components the way that you would factors that have been extracted from a factor analysis. Rather, most people are interested in the component scores, which are used for data reduction as opposed to factor analysis where you are looking for underlying latent continua. Reproduced Correlations — This table contains two tables, so reproduced correlations in the top part of the table, and the residuals in the bottom part of the table.

Reproduced Correlation — The reproduced correlation matrix is the correlation matrix based on the extracted components. You want the values in the reproduced matrix to be as close to the spsz in the original correlation matrix as possible. This means that you want the residual matrix, which contains the differences between the original and the reproduced matrix, to be close to zero.

If the reproduced matrix is very similar to the original correlation matrix, then you know that the components that were extracted accounted for a great deal of the variance in the original correlation matrix, and these few components do a good job of representing the original data. The numbers on the diagonal of the reproduced correlation matrix are presented in the Communalities table in how to start a food truck business in virginia column labeled Extracted.

How to cite this page. Mean — These are the means of the variables used in the factor analysis. Analysis N — This is the number of cases used in the factor analysis.

Introduction

Principal components analysis is based on the correlation matrix of the variables involved, and correlations usually need a large sample size before they stabilize. Tabachnick and Fidell (, page ) cite Comrey and Lee’s () advise regarding sample size: 50 cases is very poor, is poor, is fair, is good, is very good. Apr 29,  · This video provides an overview of Principal components analysis in SPSS as a data reduction technique (keep in mind the assumption is you are working with m. Running a PCA with 2 components in SPSS. Running the two component PCA is just as easy as running the 8 component solution. The only difference is under Fixed number of factors – Factors to extract you enter 2. We will focus the differences in the output between the eight and two-component solution.

For the PCA portion of the seminar, we will introduce topics such as eigenvalues and eigenvectors, communalities, sum of squared loadings, total variance explained, and choosing the number of components to extract. For the EFA portion, we will discuss factor extraction, estimation methods, factor rotation, and generating factor scores for subsequent analyses. The basic assumption of factor analysis is that for a collection of observed variables there are a set of underlying or latent variables called factors smaller than the number of observed variables , that can explain the interrelationships among those variables.

Click on the preceding hyperlinks to download the SPSS version of both files. The SAQ-8 consists of the following questions:.

Due to relatively high correlations among items, this would be a good candidate for factor analysis. Recall that the goal of factor analysis is to model the interrelationships between items with fewer latent variables. These interrelationships can be broken up into multiple components. Since the goal of factor analysis is to model the interrelationships among items, we focus primarily on the variance and covariance rather than the mean.

Factor analysis assumes that variance can be partitioned into two types of variance, common and unique. The total variance is made up to common variance and unique variance, and unique variance is composed of specific and error variance. Here you see that SPSS Anxiety makes up the common variance for all eight items, but within each item there is specific variance and error variance.

Now that we understand partitioning of variance we can move on to performing our first factor analysis. In fact, the assumptions we make about variance partitioning affects which analysis we run. As a data analyst, the goal of a factor analysis is to reduce the number of variables to explain and to interpret the results. This can be accomplished in two steps:. Factor extraction involves making a choice about the type of model as well the number of factors to extract. Factor rotation comes after the factors are extracted, with the goal of achieving simple structure in order to improve interpretability.

There are two approaches to factor extraction which stems from different approaches to variance partitioning: a principal components analysis and b common factor analysis. Unlike factor analysis, principal components analysis or PCA makes the assumption that there is no unique variance, the total variance is equal to common variance. Recall that variance can be partitioned into common and unique variance. If there is no unique variance then common variance takes up total variance see figure below.

Additionally, if the total variance is 1, then the common variance is equal to the communality. The goal of a PCA is to replicate the correlation matrix using a set of components that are fewer in number and linear combinations of the original set of items.

Although the following analysis defeats the purpose of doing a PCA we will begin by extracting as many components as possible as a teaching exercise and so that we can decide on the optimal number of components to extract later. First go to Analyze — Dimension Reduction — Factor.

Move all the observed variables over the Variables: box to be analyze. Under Extraction — Method, pick Principal components and make sure to Analyze the Correlation matrix. We also request the Unrotated factor solution and the Scree plot. Under Extract, choose Fixed number of factors, and under Factor to extract enter 8.

We also bumped up the Maximum Iterations of Convergence to Eigenvalues represent the total amount of variance that can be explained by a given principal component. They can be positive or negative in theory, but in practice they explain variance which is always positive. Eigenvalues are also the sum of squared component loadings across all items for each component, which represent the amount of variance in each item that can be explained by the principal component.

Eigenvectors represent a weight for each eigenvalue. The eigenvector times the square root of the eigenvalue gives the component loadings which can be interpreted as the correlation of each item with the principal component.

We can calculate the first component as. The components can be interpreted as the correlation of each item with the component. Each item has a loading corresponding to each of the 8 components. This is also known as the communality , and in a PCA the communality for each item is equal to the total variance.

Summing the squared component loadings across the components columns gives you the communality estimates for each item, and summing each squared loading down the items rows gives you the eigenvalue for each component. For example, to obtain the first eigenvalue we calculate:. Recall that the eigenvalue represents the total amount of variance that can be explained by a given principal component. Starting from the first component, each subsequent component is obtained from partialling out the previous component.

Therefore the first component explains the most variance, and the last component explains the least. Looking at the Total Variance Explained table, you will get the total variance explained by each component.

Because we extracted the same number of components as the number of items, the Initial Eigenvalues column is the same as the Extraction Sums of Squared Loadings column. Since the goal of running a PCA is to reduce our set of variables down, it would useful to have a criterion for selecting the optimal number of components that are of course smaller than the total number of items.

One criterion is the choose components that have eigenvalues greater than 1. Under the Total Variance Explained table, we see the first two components have an eigenvalue greater than 1. This can be confirmed by the Scree Plot which plots the eigenvalue total variance explained by the component number. Recall that we checked the Scree Plot option under Extraction — Display, so the scree plot should be produced automatically.

The first component will always have the highest total variance and the last component will always have the least, but where do we see the largest drop? Using the scree plot we pick two components. Picking the number of components is a bit of an art and requires input from the whole research team. Running the two component PCA is just as easy as running the 8 component solution. The only difference is under Fixed number of factors — Factors to extract you enter 2.

We will focus the differences in the output between the eight and two-component solution. Again, we interpret Item 1 as having a correlation of 0. From glancing at the solution, we see that Item 4 has the highest correlation with Component 1 and Item 2 the lowest. Similarly, we see that Item 2 has the highest correlation with Component 2 and Item 7 the lowest. The communality is the sum of the squared component loadings up to the number of components you extract.

In the SPSS output you will see a table of communalities. Since PCA is an iterative estimation process, it starts with 1 as an initial estimate of the communality since this is the total variance across all 8 components , and then proceeds with the analysis until a final communality extracted.

Notice that the Extraction column is smaller than the Initial column because we only extracted two components. Recall that squaring the loadings and summing down the components columns gives us the communality:.

Is that surprising? In an 8-component PCA, how many components must you extract so that the communality for the Initial column is equal to the Extraction column? F, the eigenvalue is the total communality across all items for a single component, 2. F you can only sum communalities across items, and sum eigenvalues across components, but if you do that they are equal. The partitioning of variance differentiates a principal components analysis from what we call common factor analysis.

Both methods try to reduce the dimensionality of the dataset down to fewer unobserved variables, but whereas PCA assumes that there common variances takes up all of total variance, common factor analysis assumes that total variance can be partitioned into common and unique variance. It is usually more reasonable to assume that you have not measured your set of items perfectly.

The unobserved or latent variable that makes up common variance is called a factor , hence the name factor analysis. The other main difference between PCA and factor analysis lies in the goal of your analysis. If your goal is to simply reduce your variable list down into a linear combination of smaller components then PCA is the way to go. However, if you believe there is some latent construct that defines the interrelationship among items, then factor analysis may be more appropriate.

In this case, we assume that there is a construct called SPSS Anxiety that explains why you see a correlation among all the items on the SAQ-8, we acknowledge however that SPSS Anxiety cannot explain all the shared variance among items in the SAQ, so we model the unique variance as well. Based on the results of the PCA, we will start with a two factor extraction.

Note that we continue to set Maximum Iterations for Convergence at and we will see why later. The most striking difference between this communalities table and the one from the PCA is that the initial extraction is no longer one. Recall that for a PCA, we assume the total variance is completely taken up by the common variance or communality, and therefore we pick 1 as our best initial guess.

To see this in action for Item 1 run a linear regression where Item 1 is the dependent variable and Items 2 -8 are independent variables. Go to Analyze — Regression — Linear and enter q01 under Dependent and q02 to q08 under Independent s.

Note that 0. We can do eight more linear regressions in order to get all eight communality estimates but SPSS already does that for us. Like PCA, factor analysis also uses an iterative estimation process to obtain the final estimates under the Extraction column. Finally, summing all the rows of the extraction column, and we get 3. This represents the total common variance shared among all items for a two factor solution. The next table we will look at is Total Variance Explained. In fact, SPSS simply borrows the information from the PCA analysis for use in the factor analysis and the factors are actually components in the Initial Eigenvalues column.

The main difference now is in the Extraction Sums of Squares Loadings. We notice that each corresponding row in the Extraction column is lower than the Initial column. This is expected because we assume that total variance can be partitioned into common and unique variance, which means the common variance explained will be lower.

Factor 1 explains Just as in PCA the more factors you extract, the less variance explained by each successive factor. A subtle note that may be easily overlooked is that when SPSS plots the scree plot or the Eigenvalues greater than 1 criterion Analyze — Dimension Reduction — Factor — Extraction , it bases it off the Initial and not the Extraction solution.

This is important because the criterion here assumes no unique variance as in PCA, which means that this is the total variance explained not accounting for specific or measurement error. Note that in the Extraction of Sums Squared Loadings column the second factor has an eigenvalue that is less than 1 but is still retained because the Initial value is 1.