Comparison of MSE values ​​of OLS, Ridge, LASSO and Elasticnet regression

8 views (last 30 days)
Hello everyone. I have a matlab code. I want to calculate MSE values (OLS, Ridge, LASSO and Elasticnet regression).
I'm getting the;
"Index in position 1 exceeds array bounds. Index must not exceed 1.
Error in untitled25 (line 4)
cv = cvpartition(size(X,1),'HoldOut',0.2);."
error
Do you think there is something wrong with this code? Or do you have a better code you have written on this subject?
y is the dependent variable. x, independent variables. I also added the dataset. Thanks for answer.
clc
y=[LifeExpectancyData.LifeExpectancy];
X=[LifeExpectancyData.AdultMortality,LifeExpectancyData.infantDeaths,LifeExpectancyData.Alcohol,LifeExpectancyData.percentageExpenditure,LifeExpectancyData.HepatitisB,LifeExpectancyData.Measles,LifeExpectancyData.BMI,LifeExpectancyData.underfiveDeaths,LifeExpectancyData.Polio,LifeExpectancyData.TotalExpenditure,LifeExpectancyData.Diphtheria,LifeExpectancyData.HIVAIDS,LifeExpectancyData.GDP,LifeExpectancyData.Population,LifeExpectancyData.thinness119Years,LifeExpectancyData.thinness59Years,LifeExpectancyData.IncomeCompositionOfResources,LifeExpectancyData.Schooling];
% Split data into training and test sets
cv = cvpartition(size(X,1),'HoldOut',0.2);
X_train = X(training(cv),:);
y_train = y(training(cv));
X_test = X(test(cv),:);
y_test = y(test(cv));
% Define lambda values for cross-validation
lambda_values = logspace(-4, 4, 100);
% Least Squares Regression
b_ls = X_train \ y_train;
y_pred_ls = X_test * b_ls;
mse_ls = mean((y_test - y_pred_ls).^2);
fprintf('Least Squares MSE: %.4f\n', mse_ls);
% Ridge Regression
mse_ridge = zeros(length(lambda_values),1);
for i = 1:length(lambda_values)
b_ridge = ridge(y_train, X_train, lambda_values(i), 0);
y_pred_ridge = X_test * b_ridge(2:end) + b_ridge(1);
mse_ridge(i) = mean((y_test - y_pred_ridge).^2);
end
[~,idx_min_ridge] = min(mse_ridge);
fprintf('Ridge Regression MSE (lambda=%.4f): %.4f\n', lambda_values(idx_min_ridge), mse_ridge(idx_min_ridge));
% LASSO Regression
mse_lasso = zeros(length(lambda_values),1);
for i = 1:length(lambda_values)
[b_lasso, FitInfo] = lasso(X_train, y_train, 'Lambda', lambda_values(i));
y_pred_lasso = X_test * b_lasso + FitInfo.Intercept;
mse_lasso(i) = mean((y_test - y_pred_lasso).^2);
end
[~,idx_min_lasso] = min(mse_lasso);
fprintf('LASSO Regression MSE (lambda=%.4f): %.4f\n', lambda_values(idx_min_lasso), mse_lasso(idx_min_lasso));
% Elastic Net Regression
mse_enet = zeros(length(lambda_values),1);
alpha = 0.5; % Elastic net mixing parameter
for i = 1:length(lambda_values)
[b_enet, FitInfo] = lasso(X_train, y_train, 'Lambda', lambda_values(i), 'Alpha', alpha);
y_pred_enet = X_test * b_enet + FitInfo.Intercept;
mse_enet(i) = mean((y_test - y_pred_enet).^2);
end
[~,idx_min_enet] = min(mse_enet);
fprintf('Elastic Net Regression MSE (lambda=%.4f, alpha=%.2f): %.4f\n', lambda_values(idx_min_enet), alpha, mse_enet(idx_min_enet));
% Cross-validation
k = 10; % Number of folds
cvp = cvpartition(size(X,1),'KFold',k);
% Preallocate MSE arrays for cross-validation
mse_cv_ls = zeros(k,1);
mse_cv_ridge = zeros(k,length(lambda_values));
mse_cv_lasso = zeros(k,length(lambda_values));
mse_cv_enet = zeros(k,length(lambda_values));
for i = 1:k
X_train_cv = X(training(cvp, i), :);
y_train_cv = y(training(cvp, i));
X_test_cv = X(test(cvp, i), :);
y_test_cv = y(test(cvp, i));
% Least Squares Regression
b_ls_cv = X_train_cv \ y_train_cv;
y_pred_ls_cv = X_test_cv * b_ls_cv;
mse_cv_ls(i) = mean((y_test_cv - y_pred_ls_cv).^2);
% Ridge Regression
for j = 1:length(lambda_values)
b_ridge_cv = ridge(y_train_cv, X_train_cv, lambda_values(j), 0);
y_pred_ridge_cv = X_test_cv * b_ridge_cv(2:end) + b_ridge_cv(1);
mse_cv_ridge(i,j) = mean((y_test_cv - y_pred_ridge_cv).^2);
end
% LASSO Regression
for j = 1:length(lambda_values)
[b_lasso_cv, FitInfo_cv] = lasso(X_train_cv, y_train_cv, 'Lambda', lambda_values(j));
y_pred_lasso_cv = X_test_cv * b_lasso_cv + FitInfo_cv.Intercept;
mse_cv_lasso(i,j) = mean((y_test_cv - y_pred_lasso_cv).^2);
end
% Elastic Net Regression
for j = 1:length(lambda_values)
[b_enet_cv, FitInfo_cv] = lasso(X_train_cv, y_train_cv, 'Lambda', lambda_values(j), 'Alpha', alpha);
y_pred_enet_cv = X_test_cv * b_enet_cv + FitInfo_cv.Intercept;
mse_cv_enet(i,j) = mean((y_test_cv - y_pred_enet_cv).^2);
end
end
% Calculate mean cross-validated MSE for each model
mse_cv_ls_mean = mean(mse_cv_ls);
mse_cv_ridge_mean = mean(mse_cv_ridge);
mse_cv_lasso_mean = mean(mse_cv_lasso);
mse_cv_enet_mean = mean(mse_cv_enet);
% Find best lambda for Ridge, LASSO, and Elastic Net
[~, best_lambda_ridge] = min(mse_cv_ridge_mean);
[~, best_lambda_lasso] = min(mse_cv_lasso_mean);
[~, best_lambda_enet] = min(mse_cv_enet_mean);
fprintf('Cross-validated MSE for Least Squares: %.4f\n', mse_cv_ls_mean);
fprintf('Cross-validated MSE for Ridge Regression (best lambda=%.4f): %.4f\n', lambda_values(best_lambda_ridge), mse_cv_ridge_mean(best_lambda_ridge));
fprintf('Cross-validated MSE for LASSO Regression (best lambda=%.4f): %.4f\n', lambda_values(best_lambda_lasso), mse_cv_lasso_mean(best_lambda_lasso));
fprintf('Cross-validated MSE for Elastic Net Regression (best lambda=%.4f, alpha=%.2f): %.4f\n', lambda_values(best_lambda_enet), alpha, mse_cv_enet_mean(best_lambda_enet));

Answers (1)

the cyclist
the cyclist on 21 May 2024
Moved: the cyclist on 23 May 2024
Your code runs fine for me. You did not show how you imported that data into MATLAB. I used readtable, and had to edit some of your column names. But I do not get the error you got.
LifeExpectancyData = readtable("Life Expectancy Data.xlsx");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
y=[LifeExpectancyData.LifeExpectancy];
X=[LifeExpectancyData.AdultMortality,LifeExpectancyData.infantDeaths,LifeExpectancyData.Alcohol,LifeExpectancyData.percentageExpenditure,LifeExpectancyData.HepatitisB,LifeExpectancyData.Measles,LifeExpectancyData.BMI,LifeExpectancyData.under_fiveDeaths,LifeExpectancyData.Polio,LifeExpectancyData.TotalExpenditure,LifeExpectancyData.Diphtheria,LifeExpectancyData.HIV_AIDS,LifeExpectancyData.GDP,LifeExpectancyData.Population,LifeExpectancyData.thinness1_19Years,LifeExpectancyData.thinness5_9Years,LifeExpectancyData.IncomeCompositionOfResources,LifeExpectancyData.Schooling];
% Split data into training and test sets
cv = cvpartition(size(X,1),'HoldOut',0.2);
X_train = X(training(cv),:);
y_train = y(training(cv));
X_test = X(test(cv),:);
y_test = y(test(cv));
% Define lambda values for cross-validation
lambda_values = logspace(-4, 4, 100);
% Least Squares Regression
b_ls = X_train \ y_train;
y_pred_ls = X_test * b_ls;
mse_ls = mean((y_test - y_pred_ls).^2);
fprintf('Least Squares MSE: %.4f\n', mse_ls);
Least Squares MSE: 59.8879
% Ridge Regression
mse_ridge = zeros(length(lambda_values),1);
for i = 1:length(lambda_values)
b_ridge = ridge(y_train, X_train, lambda_values(i), 0);
y_pred_ridge = X_test * b_ridge(2:end) + b_ridge(1);
mse_ridge(i) = mean((y_test - y_pred_ridge).^2);
end
[~,idx_min_ridge] = min(mse_ridge);
fprintf('Ridge Regression MSE (lambda=%.4f): %.4f\n', lambda_values(idx_min_ridge), mse_ridge(idx_min_ridge));
Ridge Regression MSE (lambda=1.9179): 14.5726
% LASSO Regression
mse_lasso = zeros(length(lambda_values),1);
for i = 1:length(lambda_values)
[b_lasso, FitInfo] = lasso(X_train, y_train, 'Lambda', lambda_values(i));
y_pred_lasso = X_test * b_lasso + FitInfo.Intercept;
mse_lasso(i) = mean((y_test - y_pred_lasso).^2);
end
[~,idx_min_lasso] = min(mse_lasso);
fprintf('LASSO Regression MSE (lambda=%.4f): %.4f\n', lambda_values(idx_min_lasso), mse_lasso(idx_min_lasso));
LASSO Regression MSE (lambda=0.0072): 14.5553
% Elastic Net Regression
mse_enet = zeros(length(lambda_values),1);
alpha = 0.5; % Elastic net mixing parameter
for i = 1:length(lambda_values)
[b_enet, FitInfo] = lasso(X_train, y_train, 'Lambda', lambda_values(i), 'Alpha', alpha);
y_pred_enet = X_test * b_enet + FitInfo.Intercept;
mse_enet(i) = mean((y_test - y_pred_enet).^2);
end
[~,idx_min_enet] = min(mse_enet);
fprintf('Elastic Net Regression MSE (lambda=%.4f, alpha=%.2f): %.4f\n', lambda_values(idx_min_enet), alpha, mse_enet(idx_min_enet));
Elastic Net Regression MSE (lambda=0.0016, alpha=0.50): 14.5699
% Cross-validation
k = 10; % Number of folds
cvp = cvpartition(size(X,1),'KFold',k);
% Preallocate MSE arrays for cross-validation
mse_cv_ls = zeros(k,1);
mse_cv_ridge = zeros(k,length(lambda_values));
mse_cv_lasso = zeros(k,length(lambda_values));
mse_cv_enet = zeros(k,length(lambda_values));
for i = 1:k
X_train_cv = X(training(cvp, i), :);
y_train_cv = y(training(cvp, i));
X_test_cv = X(test(cvp, i), :);
y_test_cv = y(test(cvp, i));
% Least Squares Regression
b_ls_cv = X_train_cv \ y_train_cv;
y_pred_ls_cv = X_test_cv * b_ls_cv;
mse_cv_ls(i) = mean((y_test_cv - y_pred_ls_cv).^2);
% Ridge Regression
for j = 1:length(lambda_values)
b_ridge_cv = ridge(y_train_cv, X_train_cv, lambda_values(j), 0);
y_pred_ridge_cv = X_test_cv * b_ridge_cv(2:end) + b_ridge_cv(1);
mse_cv_ridge(i,j) = mean((y_test_cv - y_pred_ridge_cv).^2);
end
% LASSO Regression
for j = 1:length(lambda_values)
[b_lasso_cv, FitInfo_cv] = lasso(X_train_cv, y_train_cv, 'Lambda', lambda_values(j));
y_pred_lasso_cv = X_test_cv * b_lasso_cv + FitInfo_cv.Intercept;
mse_cv_lasso(i,j) = mean((y_test_cv - y_pred_lasso_cv).^2);
end
% Elastic Net Regression
for j = 1:length(lambda_values)
[b_enet_cv, FitInfo_cv] = lasso(X_train_cv, y_train_cv, 'Lambda', lambda_values(j), 'Alpha', alpha);
y_pred_enet_cv = X_test_cv * b_enet_cv + FitInfo_cv.Intercept;
mse_cv_enet(i,j) = mean((y_test_cv - y_pred_enet_cv).^2);
end
end
% Calculate mean cross-validated MSE for each model
mse_cv_ls_mean = mean(mse_cv_ls);
mse_cv_ridge_mean = mean(mse_cv_ridge);
mse_cv_lasso_mean = mean(mse_cv_lasso);
mse_cv_enet_mean = mean(mse_cv_enet);
% Find best lambda for Ridge, LASSO, and Elastic Net
[~, best_lambda_ridge] = min(mse_cv_ridge_mean);
[~, best_lambda_lasso] = min(mse_cv_lasso_mean);
[~, best_lambda_enet] = min(mse_cv_enet_mean);
fprintf('Cross-validated MSE for Least Squares: %.4f\n', mse_cv_ls_mean);
Cross-validated MSE for Least Squares: 65.3909
fprintf('Cross-validated MSE for Ridge Regression (best lambda=%.4f): %.4f\n', lambda_values(best_lambda_ridge), mse_cv_ridge_mean(best_lambda_ridge));
Cross-validated MSE for Ridge Regression (best lambda=0.3594): 14.6746
fprintf('Cross-validated MSE for LASSO Regression (best lambda=%.4f): %.4f\n', lambda_values(best_lambda_lasso), mse_cv_lasso_mean(best_lambda_lasso));
Cross-validated MSE for LASSO Regression (best lambda=0.0011): 14.6746
fprintf('Cross-validated MSE for Elastic Net Regression (best lambda=%.4f, alpha=%.2f): %.4f\n', lambda_values(best_lambda_enet), alpha, mse_cv_enet_mean(best_lambda_enet));
Cross-validated MSE for Elastic Net Regression (best lambda=0.0003, alpha=0.50): 14.6746
  1 Comment
Kadir
Kadir on 22 May 2024
Moved: the cyclist on 23 May 2024
First of all, thank you for the answer. I imported the data set. I tried using the "readtable" command as you said and it worked. I think there was a simple problem when I imported it, but I couldn't solve it. Thanks again for your help.
  • Least Squares MSE: 64.2650
  • Ridge Regression MSE (lambda=0.0001): 16.9332
  • LASSO Regression MSE (lambda=0.0001): 16.9452
  • Elastic Net Regression MSE (lambda=0.0001, alpha=0.50): 16.9532
  • Cross-validated MSE for Least Squares: 65.2419
  • Cross-validated MSE for Ridge Regression (best lambda=0.3594): 14.7272
  • Cross-validated MSE for LASSO Regression (best lambda=0.0011): 14.7272
  • Cross-validated MSE for Elastic Net Regression (best lambda=0.0003, alpha=0.50): 14.7271

Sign in to comment.

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!