How are important variables identified in the Partial Least Squares Regression function PLSREGRESS?

14 views (last 30 days)
I am using the PLSREGRESS function in one of my applications to identify important variables in my data sets.
For another program I need to know how important variables are identified in this function?

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 24 Feb 2022
Edited: MathWorks Support Team on 3 Mar 2022
Within your Partial Least Squares (PLS) Regression calculation, the PLS projection finds those components that maximize the covariance between X and Y. For NCOMP components, it first finds the covariance between X and Y. Then, it finds a decomposition of the covariance, and then uses the resulting matrices for projection of X and Y.
Let the singular value decomposition of the covariance result in
[U,S,V] = svd(cov)
where U is the matrix of left singular vectors, and V is the matrix of right singular vectors. The following pseudo code is performed within PLSREGRESS in an iterative fashion:
for NCOMP components
    X is projected onto the column space of the vector corresponding to the largest singular value in U
    Y is projected onto the column space of the vector corresponding to the largest singular value in V
    select the NCOMP components from X and Y that maximize the covariance
There are some additional steps for orthogonalization and centering, but the main algorithm is the SIMPLS algorithm, as mentioned in the reference section of the PLSREGRESS documentation:
Please note that the implementation of the “simpls” function can be found inside of PLSREGRESS.m.
As for your other program, you might be looking for the calculation of the "Variable Importance in Projection" (VIP) scores, which estimate the importance of each variable. They can be easily obtained from the outputs of PLSREGRESS as this example illustrates:
% Load data on near infrared (NIR) spectral intensities of 60 samples of gasoline at 401 wavelengths, and their octane ratings.
load spectra
X = NIR;
Y = octane;
% Perform PLS regression with ten components.
NCOMP = 10;
[XL,YL,XS,YS,beta,pctvar,mse,stats] = plsregress(X,Y,NCOMP);
% Calculate normalized PLS weights
W0 = bsxfun(@rdivide,stats.W,sqrt(sum(stats.W.^2,1)));
% Calculate the product of summed squares of XS and YL
sumSq = sum(XS.^2,1).*sum(YL.^2,1);
% Calculate VIP scores for NCOMP components
vipScores = sqrt(size(XL,1) * sum(bsxfun(@times,sumSq,W0.^2),2) ./ sum(sumSq,2));
 

More Answers (0)

Categories

Find more on Linear and Nonlinear Regression in Help Center and File Exchange

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!