How are important variables identified in the Partial Least Squares Regression function PLSREGRESS?
14 views (last 30 days)
Show older comments
MathWorks Support Team
on 1 Feb 2019
Commented: Pat Williamson
on 15 May 2023
I am using the PLSREGRESS function in one of my applications to identify important variables in my data sets.
For another program I need to know how important variables are identified in this function?
Accepted Answer
MathWorks Support Team
on 24 Feb 2022
Edited: MathWorks Support Team
on 3 Mar 2022
Within your Partial Least Squares (PLS) Regression calculation, the PLS projection finds those components that maximize the covariance between X and Y. For NCOMP components, it first finds the covariance between X and Y. Then, it finds a decomposition of the covariance, and then uses the resulting matrices for projection of X and Y.
Let the singular value decomposition of the covariance result in
[U,S,V] = svd(cov)
where U is the matrix of left singular vectors, and V is the matrix of right singular vectors. The following pseudo code is performed within PLSREGRESS in an iterative fashion:
for NCOMP components
X is projected onto the column space of the vector corresponding to the largest singular value in U
Y is projected onto the column space of the vector corresponding to the largest singular value in V
select the NCOMP components from X and Y that maximize the covariance
There are some additional steps for orthogonalization and centering, but the main algorithm is the SIMPLS algorithm, as mentioned in the reference section of the PLSREGRESS documentation:
Please note that the implementation of the “simpls” function can be found inside of PLSREGRESS.m.
As for your other program, you might be looking for the calculation of the "Variable Importance in Projection" (VIP) scores, which estimate the importance of each variable. They can be easily obtained from the outputs of PLSREGRESS as this example illustrates:
% Load data on near infrared (NIR) spectral intensities of 60 samples of gasoline at 401 wavelengths, and their octane ratings.
load spectra
X = NIR;
Y = octane;
% Perform PLS regression with ten components.
NCOMP = 10;
[XL,YL,XS,YS,beta,pctvar,mse,stats] = plsregress(X,Y,NCOMP);
% Calculate normalized PLS weights
W0 = bsxfun(@rdivide,stats.W,sqrt(sum(stats.W.^2,1)));
% Calculate the product of summed squares of XS and YL
sumSq = sum(XS.^2,1).*sum(YL.^2,1);
% Calculate VIP scores for NCOMP components
vipScores = sqrt(size(XL,1) * sum(bsxfun(@times,sumSq,W0.^2),2) ./ sum(sumSq,2));
1 Comment
Pat Williamson
on 15 May 2023
If you are still experiencing this issue, please consider submitting a Technical Support case. We will be happy to help you out. You can do so at the following location:
More Answers (0)
See Also
Categories
Find more on Linear and Nonlinear Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!