Principal Component Analysis

What kind of dataset should PCA be used on? PCA is an unsupervised learning algorithm which means that it does not require there to be a specific outcome variable you are trying to predict in your dataset. Instead, PCA is used when you have a set of features and you want to reduce the dimensionality of your feature set. This simply means that you want to condense as much of the information in your input features as possible into a smaller set of transformed features. In particular, PCA is intended to be used when you have a set of numeric features you want to condense.
What are the main advantages and disadvantages of PCA? Here are some advantages and disadvantages you should keep in mind when deciding whether to use PCA.
When should you use principal component analysis rather than another dimensionality reduction technique? Here are some examples of situations where you should use principal component analysis.
When should you avoid using principal component analysis? Here are some examples of situations where you should avoid using principal component analysis.
1. Explained Variance (explained_variance_)
(n_components,) that stores the amount of variance explained by each of the selected principal components (PCs).Python
pca_model = PCA(...) # Create your PCA model
explained_variance = pca_model.explained_variance_
explained_variance array represents the variance captured by a corresponding PC. Higher values indicate that the PC captures more variability in the original data.2. Explained Variance Ratio (explained_variance_ratio_)
(n_components,) that stores the percentage of variance explained by each of the selected PCs.Python
pca_model = PCA(...) # Create your PCA model
explained_variance_ratio = pca_model.explained_variance_ratio_
explained_variance_ratio array represents the fraction of the total variance captured by a corresponding PC, expressed as a percentage. These values are typically between 0 and 1, with the sum of all ratios usually close to 1 (assuming all components are kept).3. Connection to Compression Rate
Both explained_variance_ and explained_variance_ratio_ help you assess the compression achieved by PCA. To understand how:
explained_variance_: By summing the values in the explained_variance_ array, you get the total variance captured by the selected PCs. Dividing this sum by the original data's total variance (often estimated using n_samples - 1 degrees of freedom) gives you the proportion of variance retained after applying PCA.explained_variance_ratio_: Directly sum the elements in the explained_variance_ratio_ array to get the cumulative percentage of variance explained by the chosen PCs. This value reflects the combined impact of all PCs on the data's variability.