lassoglm

Lasso or elastic net regularization for generalized linear models

Syntax

B = lassoglm(X,y)

B = lassoglm(X,y,distr)

B = lassoglm(X,y,distr,Name,Value)

[B,FitInfo]
= lassoglm(___)

Description

example

B = lassoglm(X,y) returns penalized, maximum-likelihood fitted coefficients for generalized linear models of the predictor data X and the response y, where the values in y are assumed to have a normal probability distribution. Each column of B corresponds to a particular regularization coefficient in Lambda. By default, lassoglm performs lasso regularization using a geometric sequence of Lambda values.

B = lassoglm(X,y,distr) performs lasso regularization to fit the models using the probability distribution distr for y.

B = lassoglm(X,y,distr,Name,Value) fits regularized generalized linear regressions with additional options specified by one or more name-value pair arguments. For example, 'Alpha',0.5 sets elastic net as the regularization method, with the parameter Alpha equal to 0.5.

example

[B,FitInfo] = lassoglm(___) also returns the structure FitInfo, which contains information about the fit of the models, using any of the input arguments in the previous syntaxes.

Examples

collapse all

Remove Redundant Predictors Using Lasso Regularization

Open Live Script

Construct a data set with redundant predictors and identify those predictors by using lassoglm.

Create a random matrix X with 100 observations and 10 predictors. Create the normally distributed response y using only four of the predictors and a small amount of noise.

rng default
X = randn(100,10);
weights = [0.6;0.5;0.7;0.4];
y = X(:,[2 4 5 7])*weights + randn(100,1)*0.1; % Small added noise

Perform lasso regularization.

B = lassoglm(X,y);

Find the coefficient vector for the 75th Lambda value in B.

B(:,75)

lassoglm identifies and removes the redundant predictors.

Cross-Validated Lasso Regularization of Generalized Linear Model

Open Live Script

Construct data from a Poisson model, and identify the important predictors by using lassoglm.

Create data with 20 predictors. Create a Poisson response variable using only three of the predictors plus a constant.

rng default % For reproducibility
X = randn(100,20);
weights = [.4;.2;.3];
mu = exp(X(:,[5 10 15])*weights + 1);
y = poissrnd(mu);

Construct a cross-validated lasso regularization of a Poisson regression model of the data.

[B,FitInfo] = lassoglm(X,y,'poisson','CV',10);

Examine the cross-validation plot to see the effect of the Lambda regularization parameter.

lassoPlot(B,FitInfo,'plottype','CV'); 
legend('show') % Show legend

Figure contains an axes. The axes with title Cross-Validated Deviance of Lasso Fit contains 5 objects of type errorbar, line. These objects represent Deviance with Error Bars, LambdaMinDeviance, Lambda1SE.

The green circle and dotted line locate the Lambda with minimum cross-validation error. The blue circle and dotted line locate the point with minimum cross-validation error plus one standard deviation.

Find the nonzero model coefficients corresponding to the two identified points.

idxLambdaMinDeviance = FitInfo.IndexMinDeviance;
mincoefs = find(B(:,idxLambdaMinDeviance))

mincoefs = 7×1

     3
     5
     6
    10
    11
    15
    16

idxLambda1SE = FitInfo.Index1SE;
min1coefs = find(B(:,idxLambda1SE))

min1coefs = 3×1

     5
    10
    15

The coefficients from the minimum-plus-one standard error point are exactly those coefficients used to create the data.

Predict Values Using Lasso Regularization

Open Live Script

Predict whether students got a B or above on their last exam by using lassoglm.

Load the examgrades data set. Convert the last exam grades to a logical vector, where 1 represents a grade of 80 or above and 0 represents a grade below 80.

load examgrades
X = grades(:,1:4);
y = grades(:,5);
yBinom = (y>=80);

Partition the data into training and test sets.

rng default    % Set the seed for reproducibility
c = cvpartition(yBinom,'HoldOut',0.3);
idxTrain = training(c,1);
idxTest = ~idxTrain;
XTrain = X(idxTrain,:);
yTrain = yBinom(idxTrain);
XTest = X(idxTest,:);
yTest = yBinom(idxTest);

Perform lasso regularization for generalized linear model regression with 3-fold cross-validation on the training data. Assume the values in y are binomially distributed. Choose model coefficients corresponding to the Lambda with minimum expected deviance.

[B,FitInfo] = lassoglm(XTrain,yTrain,'binomial','CV',3);
idxLambdaMinDeviance = FitInfo.IndexMinDeviance;
B0 = FitInfo.Intercept(idxLambdaMinDeviance);
coef = [B0; B(:,idxLambdaMinDeviance)]

Predict exam grades for the test data using the model coefficients found in the previous step. Specify the link function for a binomial response using 'logit'. Convert the prediction values to a logical vector.

yhat = glmval(coef,XTest,'logit');
yhatBinom = (yhat>=0.5);

Determine the accuracy of the predictions using a confusion matrix.

c = confusionchart(yTest,yhatBinom);

Figure contains an object of type ConfusionMatrixChart.

The function correctly predicts 31 exam grades. However, the function incorrectly predicts that 1 student receives a B or above and 4 students receive a grade below a B.

Input Arguments

collapse all

`X` — Predictor data
numeric matrix

Predictor data, specified as a numeric matrix. Each row represents one observation, and each column represents one predictor variable.

Data Types: single | double

`y` — Response data
numeric vector | logical vector | categorical array | numeric matrix

Response data, specified as a numeric vector, logical vector, categorical array, or two-column numeric matrix.

When distr is not 'binomial', y is a numeric vector or categorical array of length n, where n is the number of rows in X. The response y(i) corresponds to row i in X.
When distr is 'binomial', y is one of the following:
- Numeric vector of length n, where each entry represents success (1) or failure (0)
- Logical vector of length n, where each entry represents success or failure
- Categorical array of length n, where each entry represents success or failure
- Two-column numeric matrix, where the first column contains the number of successes for each observation and the second column contains the total number of trials

Data Types: single | double | logical | categorical

`distr` — Distribution of response data
`'normal'` | `'binomial'` | `'poisson'` | `'gamma'` | `'inverse gaussian'`

Distribution of response data, specified as one of the following:

'normal'
'binomial'
'poisson'
'gamma'
'inverse gaussian'

lassoglm uses the default link function corresponding to distr. Specify another link function using the Link name-value pair argument.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: lassoglm(X,y,'poisson','Alpha',0.5) performs elastic net regularization assuming that the response values are Poisson distributed. The 'Alpha',0.5 name-value pair argument sets the parameter used in the elastic net optimization.

`'Alpha'` — Weight of lasso versus ridge optimization
`1` (default) | positive scalar

Weight of lasso (L¹) versus ridge (L²) optimization, specified as the comma-separated pair consisting of 'Alpha' and a positive scalar value in the interval (0,1]. The value Alpha = 1 represents lasso regression, Alpha close to 0 approaches ridge regression, and other values represent elastic net optimization. See Elastic Net.

Example: 'Alpha',0.75

Data Types: single | double

`'CV'` — Cross-validation specification for estimating deviance
`'resubstitution'` (default) | positive integer scalar | `cvpartition` object

Cross-validation specification for estimating the deviance, specified as the comma-separated pair consisting of 'CV' and one of the following:

'resubstitution' — lassoglm uses X and y to fit the model and to estimate the deviance without cross-validation.
Positive scalar integer K — lassoglm uses K-fold cross-validation.
cvpartition object cvp — lassoglm uses the cross-validation method expressed in cvp. You cannot use a 'leaveout' partition with lassoglm.

Example: 'CV',10

`'DFmax'` — Maximum number of nonzero coefficients
`Inf` (default) | positive integer scalar

Maximum number of nonzero coefficients in the model, specified as the comma-separated pair consisting of 'DFmax' and a positive integer scalar. lassoglm returns results only for Lambda values that satisfy this criterion.

Example: 'DFmax',25

Data Types: single | double

`'Lambda'` — Regularization coefficients
nonnegative vector

Regularization coefficients, specified as the comma-separated pair consisting of 'Lambda' and a vector of nonnegative values. See Lasso.

If you do not supply Lambda, then lassoglm estimates the largest value of Lambda that gives a nonnull model. In this case, LambdaRatio gives the ratio of the smallest to the largest value of the sequence, and NumLambda gives the length of the vector.
If you supply Lambda, then lassoglm ignores LambdaRatio and NumLambda.
If Standardize is true, then Lambda is the set of values used to fit the models with the X data standardized to have zero mean and a variance of one.

The default is a geometric sequence of NumLambda values, with only the largest value able to produce B = 0.

Data Types: single | double

`'LambdaRatio'` — Ratio of smallest to largest `Lambda` values
`1e–4` (default) | positive scalar

Ratio of the smallest to the largest Lambda values when you do not supply Lambda, specified as the comma-separated pair consisting of 'LambdaRatio' and a positive scalar.

If you set LambdaRatio = 0, then lassoglm generates a default sequence of Lambda values and replaces the smallest one with 0.

Example: 'LambdaRatio',1e–2

Data Types: single | double

`'Link'` — Mapping between mean of response and linear predictor
`'comploglog'` | `'identity'` | `'log'` | `'logit'` | `'loglog'` | ...

Mapping between the mean µ of the response and the linear predictor Xb, specified as the comma-separated pair consisting of 'Link' and one of the values in this table.

Value	Description
`'comploglog'`	log(–log((1 – µ))) = Xb
`'identity'`, default for the distribution `'normal'`	µ = Xb
`'log'`, default for the distribution `'poisson'`	log(µ) = Xb
`'logit'`, default for the distribution `'binomial'`	log(µ/(1 – µ)) = Xb
`'loglog'`	log(–log(µ)) = Xb
`'probit'`	Φ^–1(µ) = Xb, where Φ is the normal (Gaussian) cumulative distribution function
`'reciprocal'`, default for the distribution `'gamma'`	µ^–1 = Xb
`p` (a number), default for the distribution `'inverse gaussian'` (with p = –2)	µ^p = Xb
A cell array of the form `{FL FD FI}`, containing three function handles created using `@`, which define the link (`FL`), the derivative of the link (`FD`), and the inverse link (`FI`). Or, a structure of function handles with the field `Link` containing `FL`, the field `Derivative` containing `FD`, and the field `Inverse` containing `FI`.	User-specified link function (see Custom Link Function)

Example: 'Link','probit'

Data Types: char | string | single | double | cell

`'MaxIter'` — Maximum number of iterations allowed
`1e4` (default) | positive integer scalar

Maximum number of iterations allowed, specified as the comma-separated pair consisting of 'MaxIter' and a positive integer scalar.

If the algorithm executes MaxIter iterations before reaching the convergence tolerance RelTol, then the function stops iterating and returns a warning message.

The function can return more than one warning when NumLambda is greater than 1.

Example: 'MaxIter',1e3

Data Types: single | double

`'MCReps'` — Number of Monte Carlo repetitions for cross-validation
`1` (default) | positive integer scalar

Number of Monte Carlo repetitions for cross-validation, specified as the comma-separated pair consisting of 'MCReps' and a positive integer scalar.

If CV is 'resubstitution' or a cvpartition of type 'resubstitution', then MCReps must be 1.
If CV is a cvpartition of type 'holdout', then MCReps must be greater than 1.

Example: 'MCReps',2

Data Types: single | double

`'NumLambda'` — Number of `Lambda` values
`100` (default) | positive integer scalar

Number of Lambda values lassoglm uses when you do not supply Lambda, specified as the comma-separated pair consisting of 'NumLambda' and a positive integer scalar. lassoglm can return fewer than NumLambda fits if the deviance of the fits drops below a threshold fraction of the null deviance (deviance of the fit without any predictors X).

Example: 'NumLambda',150

Data Types: single | double

`'Offset'` — Additional predictor variable
numeric vector

Additional predictor variable, specified as the comma-separated pair consisting of 'Offset' and a numeric vector with the same number of rows as X. The lassoglm function keeps the coefficient value of Offset fixed at 1.0.

Data Types: single | double

`'Options'` — Option to cross-validate in parallel and specify random streams
structure

Option to cross-validate in parallel and specify the random streams, specified as the comma-separated pair consisting of 'Options' and a structure. This option requires Parallel Computing Toolbox™.

Create the Options structure with statset. The option fields are:

UseParallel — Set to true to compute in parallel. The default is false.
UseSubstreams — Set to true to compute in parallel in a reproducible fashion. For reproducibility, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'. The default is false.
Streams — A RandStream object or cell array consisting of one such object. If you do not specify Streams, then lassoglm uses the default stream.

Example: 'Options',statset('UseParallel',true)

Data Types: struct

`'PredictorNames'` — Names of predictor variables
`{}` (default) | string array | cell array of character vectors

Names of the predictor variables, in the order in which they appear in X, specified as the comma-separated pair consisting of 'PredictorNames' and a string array or cell array of character vectors.

Example: 'PredictorNames',{'Height','Weight','Age'}

Data Types: string | cell

`'RelTol'` — Convergence threshold for coordinate descent algorithm
`1e–4` (default) | positive scalar

Convergence threshold for the coordinate descent algorithm [3], specified as the comma-separated pair consisting of 'RelTol' and a positive scalar. The algorithm terminates when successive estimates of the coefficient vector differ in the L² norm by a relative amount less than RelTol.

Example: 'RelTol',2e–3

Data Types: single | double

`'Standardize'` — Flag for standardizing predictor data before fitting models
`true` (default) | `false`

Flag for standardizing the predictor data X before fitting the models, specified as the comma-separated pair consisting of 'Standardize' and either true or false. If Standardize is true, then the X data is scaled to have zero mean and a variance of one. Standardize affects whether the regularization is applied to the coefficients on the standardized scale or the original scale. The results are always presented on the original data scale.

Example: 'Standardize',false

Data Types: logical

`'Weights'` — Observation weights
`1/n*ones(n,1)` (default) | nonnegative vector

Observation weights, specified as the comma-separated pair consisting of 'Weights' and a nonnegative vector. Weights has length n, where n is the number of rows of X. At least two values must be positive.

Data Types: single | double

Output Arguments

collapse all

`B` — Fitted coefficients
numeric matrix

Fitted coefficients, returned as a numeric matrix. B is a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values. You can specify the number of Lambda values using the NumLambda name-value pair argument.

The coefficient corresponding to the intercept term is a field in FitInfo.

Data Types: single | double

`FitInfo` — Fit information of models
structure

Fit information of the generalized linear models, returned as a structure with the fields described in this table.

Field in `FitInfo`	Description
`Intercept`	Intercept term β₀ for each linear model, a `1`-by-L vector
`Lambda`	Lambda parameters in ascending order, a `1`-by-L vector
`Alpha`	Value of the `Alpha` parameter, a scalar
`DF`	Number of nonzero coefficients in `B` for each value of `Lambda`, a `1`-by-L vector
`Deviance`	Deviance of the fitted model for each value of `Lambda`, a `1`-by-L vector If the model is cross-validated, then the values for `Deviance` represent the estimated expected deviance of the model applied to new data, as calculated by cross-validation. Otherwise, `Deviance` is the deviance of the fitted model applied to the data used to perform the fit.
`PredictorNames`	Value of the `PredictorNames` parameter, stored as a cell array of character vectors

If you set the CV name-value pair argument to cross-validate, the FitInfo structure contains these additional fields.

Field in `FitInfo`	Description
`SE`	Standard error of `Deviance` for each `Lambda`, as calculated during cross-validation, a `1`-by-L vector
`LambdaMinDeviance`	`Lambda` value with minimum expected deviance, as calculated by cross-validation, a scalar
`Lambda1SE`	Largest `Lambda` value such that `Deviance` is within one standard error of the minimum, a scalar
`IndexMinDeviance`	Index of `Lambda` with the value `LambdaMinDeviance`, a scalar
`Index1SE`	Index of `Lambda` with the value `Lambda1SE`, a scalar

More About

collapse all

Link Function

A link function f(μ) maps a distribution with mean μ to a linear model with data X and coefficient vector b using the formula

f(μ) = Xb.

You can find the formulas for the link functions in the Link name-value pair argument description. This table lists the link functions that are typically used for each distribution.

Distribution Family	Default Link Function	Other Typical Link Functions
`'normal'`	`'identity'`
`'binomial'`	`'logit'`	`'comploglog'`, `'loglog'`, `'probit'`
`'poisson'`	`'log'`
`'gamma'`	`'reciprocal'`
`'inverse gaussian'`	`–2`

Lasso

For a nonnegative value of λ, lassoglm solves the problem

$\min_{β_{0}, β} (\frac{1}{N} Deviance (β_{0}, β) + λ \sum_{j = 1}^{p} | β_{j} |) .$

The function Deviance in this equation is the deviance of the model fit to the responses using the intercept β₀ and the predictor coefficients β. The formula for Deviance depends on the distr parameter you supply to lassoglm. Minimizing the λ-penalized deviance is equivalent to maximizing the λ-penalized loglikelihood.
N is the number of observations.
λ is a nonnegative regularization parameter corresponding to one value of Lambda.
The parameters β₀ and β are a scalar and a vector of length p, respectively.

As λ increases, the number of nonzero components of β decreases.

The lasso problem involves the L¹ norm of β, as contrasted with the elastic net algorithm.

Elastic Net

For α strictly between 0 and 1, and nonnegative λ, elastic net solves the problem

$\min_{β_{0}, β} (\frac{1}{N} Deviance (β_{0}, β) + λ P_{α} (β)),$

where

$P_{α} (β) = \frac{(1 - α)}{2} {‖ β ‖}_{2}^{2} + α {‖ β ‖}_{1} = \sum_{j = 1}^{p} (\frac{(1 - α)}{2} β_{j}^{2} + α | β_{j} |) .$

Elastic net is the same as lasso when α = 1. For other values of α, the penalty term P_α(β) interpolates between the L¹ norm of β and the squared L² norm of β. As α shrinks toward 0, elastic net approaches ridge regression.

References

[1] Tibshirani, R. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B, Vol. 58, No. 1, 1996, pp. 267–288.

[2] Zou, H., and T. Hastie. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society. Series B, Vol. 67, No. 2, 2005, pp. 301–320.

[3] Friedman, J., R. Tibshirani, and T. Hastie. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software. Vol. 33, No. 1, 2010. https://www.jstatsoft.org/v33/i01

[4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. 2nd edition. New York: Springer, 2008.

[5] Dobson, A. J. An Introduction to Generalized Linear Models. 2nd edition. New York: Chapman & Hall/CRC Press, 2002.

[6] McCullagh, P., and J. A. Nelder. Generalized Linear Models. 2nd edition. New York: Chapman & Hall/CRC Press, 1989.

[7] Collett, D. Modelling Binary Data. 2nd edition. New York: Chapman & Hall/CRC Press, 2003.

Extended Capabilities

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, specify the 'Options' name-value argument in the call to this function and set the 'UseParallel' field of the options structure to true using statset.

For example: 'Options',statset('UseParallel',true)

For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

Documentation

lassoglm

Syntax

Description

Examples

Remove Redundant Predictors Using Lasso Regularization

Cross-Validated Lasso Regularization of Generalized Linear Model

Predict Values Using Lasso Regularization

Input Arguments

X — Predictor data numeric matrix

y — Response data numeric vector | logical vector | categorical array | numeric matrix

distr — Distribution of response data 'normal' | 'binomial' | 'poisson' | 'gamma' | 'inverse gaussian'

Name-Value Pair Arguments

'Alpha' — Weight of lasso versus ridge optimization 1 (default) | positive scalar

'CV' — Cross-validation specification for estimating deviance 'resubstitution' (default) | positive integer scalar | cvpartition object

'DFmax' — Maximum number of nonzero coefficients Inf (default) | positive integer scalar

'Lambda' — Regularization coefficients nonnegative vector

'LambdaRatio' — Ratio of smallest to largest Lambda values 1e–4 (default) | positive scalar

'Link' — Mapping between mean of response and linear predictor 'comploglog' | 'identity' | 'log' | 'logit' | 'loglog' | ...

'MaxIter' — Maximum number of iterations allowed 1e4 (default) | positive integer scalar

'MCReps' — Number of Monte Carlo repetitions for cross-validation 1 (default) | positive integer scalar

'NumLambda' — Number of Lambda values 100 (default) | positive integer scalar

'Offset' — Additional predictor variable numeric vector

'Options' — Option to cross-validate in parallel and specify random streams structure

'PredictorNames' — Names of predictor variables {} (default) | string array | cell array of character vectors

'RelTol' — Convergence threshold for coordinate descent algorithm 1e–4 (default) | positive scalar

'Standardize' — Flag for standardizing predictor data before fitting models true (default) | false

'Weights' — Observation weights 1/n*ones(n,1) (default) | nonnegative vector

Output Arguments

B — Fitted coefficients numeric matrix

FitInfo — Fit information of models structure

More About

Link Function

Lasso

Elastic Net

References

Extended Capabilities

Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

`X` — Predictor data
numeric matrix

`y` — Response data
numeric vector | logical vector | categorical array | numeric matrix

`distr` — Distribution of response data
`'normal'` | `'binomial'` | `'poisson'` | `'gamma'` | `'inverse gaussian'`

`'Alpha'` — Weight of lasso versus ridge optimization
`1` (default) | positive scalar

`'CV'` — Cross-validation specification for estimating deviance
`'resubstitution'` (default) | positive integer scalar | `cvpartition` object

`'DFmax'` — Maximum number of nonzero coefficients
`Inf` (default) | positive integer scalar

`'Lambda'` — Regularization coefficients
nonnegative vector

`'LambdaRatio'` — Ratio of smallest to largest `Lambda` values
`1e–4` (default) | positive scalar

`'Link'` — Mapping between mean of response and linear predictor
`'comploglog'` | `'identity'` | `'log'` | `'logit'` | `'loglog'` | ...

`'MaxIter'` — Maximum number of iterations allowed
`1e4` (default) | positive integer scalar

`'MCReps'` — Number of Monte Carlo repetitions for cross-validation
`1` (default) | positive integer scalar

`'NumLambda'` — Number of `Lambda` values
`100` (default) | positive integer scalar

`'Offset'` — Additional predictor variable
numeric vector

`'Options'` — Option to cross-validate in parallel and specify random streams
structure

`'PredictorNames'` — Names of predictor variables
`{}` (default) | string array | cell array of character vectors

`'RelTol'` — Convergence threshold for coordinate descent algorithm
`1e–4` (default) | positive scalar

`'Standardize'` — Flag for standardizing predictor data before fitting models
`true` (default) | `false`

`'Weights'` — Observation weights
`1/n*ones(n,1)` (default) | nonnegative vector

`B` — Fitted coefficients
numeric matrix

`FitInfo` — Fit information of models
structure

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.