Comparison of MSE values of OLS, Ridge, LASSO and Elasticnet regression
8 views (last 30 days)
Show older comments
Hello everyone. I have a matlab code. I want to calculate MSE values (OLS, Ridge, LASSO and Elasticnet regression).
I'm getting the;
"Index in position 1 exceeds array bounds. Index must not exceed 1.
Error in untitled25 (line 4)
cv = cvpartition(size(X,1),'HoldOut',0.2);."
error
Do you think there is something wrong with this code? Or do you have a better code you have written on this subject?
y is the dependent variable. x, independent variables. I also added the dataset. Thanks for answer.
clc
y=[LifeExpectancyData.LifeExpectancy];
X=[LifeExpectancyData.AdultMortality,LifeExpectancyData.infantDeaths,LifeExpectancyData.Alcohol,LifeExpectancyData.percentageExpenditure,LifeExpectancyData.HepatitisB,LifeExpectancyData.Measles,LifeExpectancyData.BMI,LifeExpectancyData.underfiveDeaths,LifeExpectancyData.Polio,LifeExpectancyData.TotalExpenditure,LifeExpectancyData.Diphtheria,LifeExpectancyData.HIVAIDS,LifeExpectancyData.GDP,LifeExpectancyData.Population,LifeExpectancyData.thinness119Years,LifeExpectancyData.thinness59Years,LifeExpectancyData.IncomeCompositionOfResources,LifeExpectancyData.Schooling];
% Split data into training and test sets
cv = cvpartition(size(X,1),'HoldOut',0.2);
X_train = X(training(cv),:);
y_train = y(training(cv));
X_test = X(test(cv),:);
y_test = y(test(cv));
% Define lambda values for cross-validation
lambda_values = logspace(-4, 4, 100);
% Least Squares Regression
b_ls = X_train \ y_train;
y_pred_ls = X_test * b_ls;
mse_ls = mean((y_test - y_pred_ls).^2);
fprintf('Least Squares MSE: %.4f\n', mse_ls);
% Ridge Regression
mse_ridge = zeros(length(lambda_values),1);
for i = 1:length(lambda_values)
b_ridge = ridge(y_train, X_train, lambda_values(i), 0);
y_pred_ridge = X_test * b_ridge(2:end) + b_ridge(1);
mse_ridge(i) = mean((y_test - y_pred_ridge).^2);
end
[~,idx_min_ridge] = min(mse_ridge);
fprintf('Ridge Regression MSE (lambda=%.4f): %.4f\n', lambda_values(idx_min_ridge), mse_ridge(idx_min_ridge));
% LASSO Regression
mse_lasso = zeros(length(lambda_values),1);
for i = 1:length(lambda_values)
[b_lasso, FitInfo] = lasso(X_train, y_train, 'Lambda', lambda_values(i));
y_pred_lasso = X_test * b_lasso + FitInfo.Intercept;
mse_lasso(i) = mean((y_test - y_pred_lasso).^2);
end
[~,idx_min_lasso] = min(mse_lasso);
fprintf('LASSO Regression MSE (lambda=%.4f): %.4f\n', lambda_values(idx_min_lasso), mse_lasso(idx_min_lasso));
% Elastic Net Regression
mse_enet = zeros(length(lambda_values),1);
alpha = 0.5; % Elastic net mixing parameter
for i = 1:length(lambda_values)
[b_enet, FitInfo] = lasso(X_train, y_train, 'Lambda', lambda_values(i), 'Alpha', alpha);
y_pred_enet = X_test * b_enet + FitInfo.Intercept;
mse_enet(i) = mean((y_test - y_pred_enet).^2);
end
[~,idx_min_enet] = min(mse_enet);
fprintf('Elastic Net Regression MSE (lambda=%.4f, alpha=%.2f): %.4f\n', lambda_values(idx_min_enet), alpha, mse_enet(idx_min_enet));
% Cross-validation
k = 10; % Number of folds
cvp = cvpartition(size(X,1),'KFold',k);
% Preallocate MSE arrays for cross-validation
mse_cv_ls = zeros(k,1);
mse_cv_ridge = zeros(k,length(lambda_values));
mse_cv_lasso = zeros(k,length(lambda_values));
mse_cv_enet = zeros(k,length(lambda_values));
for i = 1:k
X_train_cv = X(training(cvp, i), :);
y_train_cv = y(training(cvp, i));
X_test_cv = X(test(cvp, i), :);
y_test_cv = y(test(cvp, i));
% Least Squares Regression
b_ls_cv = X_train_cv \ y_train_cv;
y_pred_ls_cv = X_test_cv * b_ls_cv;
mse_cv_ls(i) = mean((y_test_cv - y_pred_ls_cv).^2);
% Ridge Regression
for j = 1:length(lambda_values)
b_ridge_cv = ridge(y_train_cv, X_train_cv, lambda_values(j), 0);
y_pred_ridge_cv = X_test_cv * b_ridge_cv(2:end) + b_ridge_cv(1);
mse_cv_ridge(i,j) = mean((y_test_cv - y_pred_ridge_cv).^2);
end
% LASSO Regression
for j = 1:length(lambda_values)
[b_lasso_cv, FitInfo_cv] = lasso(X_train_cv, y_train_cv, 'Lambda', lambda_values(j));
y_pred_lasso_cv = X_test_cv * b_lasso_cv + FitInfo_cv.Intercept;
mse_cv_lasso(i,j) = mean((y_test_cv - y_pred_lasso_cv).^2);
end
% Elastic Net Regression
for j = 1:length(lambda_values)
[b_enet_cv, FitInfo_cv] = lasso(X_train_cv, y_train_cv, 'Lambda', lambda_values(j), 'Alpha', alpha);
y_pred_enet_cv = X_test_cv * b_enet_cv + FitInfo_cv.Intercept;
mse_cv_enet(i,j) = mean((y_test_cv - y_pred_enet_cv).^2);
end
end
% Calculate mean cross-validated MSE for each model
mse_cv_ls_mean = mean(mse_cv_ls);
mse_cv_ridge_mean = mean(mse_cv_ridge);
mse_cv_lasso_mean = mean(mse_cv_lasso);
mse_cv_enet_mean = mean(mse_cv_enet);
% Find best lambda for Ridge, LASSO, and Elastic Net
[~, best_lambda_ridge] = min(mse_cv_ridge_mean);
[~, best_lambda_lasso] = min(mse_cv_lasso_mean);
[~, best_lambda_enet] = min(mse_cv_enet_mean);
fprintf('Cross-validated MSE for Least Squares: %.4f\n', mse_cv_ls_mean);
fprintf('Cross-validated MSE for Ridge Regression (best lambda=%.4f): %.4f\n', lambda_values(best_lambda_ridge), mse_cv_ridge_mean(best_lambda_ridge));
fprintf('Cross-validated MSE for LASSO Regression (best lambda=%.4f): %.4f\n', lambda_values(best_lambda_lasso), mse_cv_lasso_mean(best_lambda_lasso));
fprintf('Cross-validated MSE for Elastic Net Regression (best lambda=%.4f, alpha=%.2f): %.4f\n', lambda_values(best_lambda_enet), alpha, mse_cv_enet_mean(best_lambda_enet));
0 Comments
Answers (1)
the cyclist
on 21 May 2024
Moved: the cyclist
on 23 May 2024
Your code runs fine for me. You did not show how you imported that data into MATLAB. I used readtable, and had to edit some of your column names. But I do not get the error you got.
LifeExpectancyData = readtable("Life Expectancy Data.xlsx");
y=[LifeExpectancyData.LifeExpectancy];
X=[LifeExpectancyData.AdultMortality,LifeExpectancyData.infantDeaths,LifeExpectancyData.Alcohol,LifeExpectancyData.percentageExpenditure,LifeExpectancyData.HepatitisB,LifeExpectancyData.Measles,LifeExpectancyData.BMI,LifeExpectancyData.under_fiveDeaths,LifeExpectancyData.Polio,LifeExpectancyData.TotalExpenditure,LifeExpectancyData.Diphtheria,LifeExpectancyData.HIV_AIDS,LifeExpectancyData.GDP,LifeExpectancyData.Population,LifeExpectancyData.thinness1_19Years,LifeExpectancyData.thinness5_9Years,LifeExpectancyData.IncomeCompositionOfResources,LifeExpectancyData.Schooling];
% Split data into training and test sets
cv = cvpartition(size(X,1),'HoldOut',0.2);
X_train = X(training(cv),:);
y_train = y(training(cv));
X_test = X(test(cv),:);
y_test = y(test(cv));
% Define lambda values for cross-validation
lambda_values = logspace(-4, 4, 100);
% Least Squares Regression
b_ls = X_train \ y_train;
y_pred_ls = X_test * b_ls;
mse_ls = mean((y_test - y_pred_ls).^2);
fprintf('Least Squares MSE: %.4f\n', mse_ls);
% Ridge Regression
mse_ridge = zeros(length(lambda_values),1);
for i = 1:length(lambda_values)
b_ridge = ridge(y_train, X_train, lambda_values(i), 0);
y_pred_ridge = X_test * b_ridge(2:end) + b_ridge(1);
mse_ridge(i) = mean((y_test - y_pred_ridge).^2);
end
[~,idx_min_ridge] = min(mse_ridge);
fprintf('Ridge Regression MSE (lambda=%.4f): %.4f\n', lambda_values(idx_min_ridge), mse_ridge(idx_min_ridge));
% LASSO Regression
mse_lasso = zeros(length(lambda_values),1);
for i = 1:length(lambda_values)
[b_lasso, FitInfo] = lasso(X_train, y_train, 'Lambda', lambda_values(i));
y_pred_lasso = X_test * b_lasso + FitInfo.Intercept;
mse_lasso(i) = mean((y_test - y_pred_lasso).^2);
end
[~,idx_min_lasso] = min(mse_lasso);
fprintf('LASSO Regression MSE (lambda=%.4f): %.4f\n', lambda_values(idx_min_lasso), mse_lasso(idx_min_lasso));
% Elastic Net Regression
mse_enet = zeros(length(lambda_values),1);
alpha = 0.5; % Elastic net mixing parameter
for i = 1:length(lambda_values)
[b_enet, FitInfo] = lasso(X_train, y_train, 'Lambda', lambda_values(i), 'Alpha', alpha);
y_pred_enet = X_test * b_enet + FitInfo.Intercept;
mse_enet(i) = mean((y_test - y_pred_enet).^2);
end
[~,idx_min_enet] = min(mse_enet);
fprintf('Elastic Net Regression MSE (lambda=%.4f, alpha=%.2f): %.4f\n', lambda_values(idx_min_enet), alpha, mse_enet(idx_min_enet));
% Cross-validation
k = 10; % Number of folds
cvp = cvpartition(size(X,1),'KFold',k);
% Preallocate MSE arrays for cross-validation
mse_cv_ls = zeros(k,1);
mse_cv_ridge = zeros(k,length(lambda_values));
mse_cv_lasso = zeros(k,length(lambda_values));
mse_cv_enet = zeros(k,length(lambda_values));
for i = 1:k
X_train_cv = X(training(cvp, i), :);
y_train_cv = y(training(cvp, i));
X_test_cv = X(test(cvp, i), :);
y_test_cv = y(test(cvp, i));
% Least Squares Regression
b_ls_cv = X_train_cv \ y_train_cv;
y_pred_ls_cv = X_test_cv * b_ls_cv;
mse_cv_ls(i) = mean((y_test_cv - y_pred_ls_cv).^2);
% Ridge Regression
for j = 1:length(lambda_values)
b_ridge_cv = ridge(y_train_cv, X_train_cv, lambda_values(j), 0);
y_pred_ridge_cv = X_test_cv * b_ridge_cv(2:end) + b_ridge_cv(1);
mse_cv_ridge(i,j) = mean((y_test_cv - y_pred_ridge_cv).^2);
end
% LASSO Regression
for j = 1:length(lambda_values)
[b_lasso_cv, FitInfo_cv] = lasso(X_train_cv, y_train_cv, 'Lambda', lambda_values(j));
y_pred_lasso_cv = X_test_cv * b_lasso_cv + FitInfo_cv.Intercept;
mse_cv_lasso(i,j) = mean((y_test_cv - y_pred_lasso_cv).^2);
end
% Elastic Net Regression
for j = 1:length(lambda_values)
[b_enet_cv, FitInfo_cv] = lasso(X_train_cv, y_train_cv, 'Lambda', lambda_values(j), 'Alpha', alpha);
y_pred_enet_cv = X_test_cv * b_enet_cv + FitInfo_cv.Intercept;
mse_cv_enet(i,j) = mean((y_test_cv - y_pred_enet_cv).^2);
end
end
% Calculate mean cross-validated MSE for each model
mse_cv_ls_mean = mean(mse_cv_ls);
mse_cv_ridge_mean = mean(mse_cv_ridge);
mse_cv_lasso_mean = mean(mse_cv_lasso);
mse_cv_enet_mean = mean(mse_cv_enet);
% Find best lambda for Ridge, LASSO, and Elastic Net
[~, best_lambda_ridge] = min(mse_cv_ridge_mean);
[~, best_lambda_lasso] = min(mse_cv_lasso_mean);
[~, best_lambda_enet] = min(mse_cv_enet_mean);
fprintf('Cross-validated MSE for Least Squares: %.4f\n', mse_cv_ls_mean);
fprintf('Cross-validated MSE for Ridge Regression (best lambda=%.4f): %.4f\n', lambda_values(best_lambda_ridge), mse_cv_ridge_mean(best_lambda_ridge));
fprintf('Cross-validated MSE for LASSO Regression (best lambda=%.4f): %.4f\n', lambda_values(best_lambda_lasso), mse_cv_lasso_mean(best_lambda_lasso));
fprintf('Cross-validated MSE for Elastic Net Regression (best lambda=%.4f, alpha=%.2f): %.4f\n', lambda_values(best_lambda_enet), alpha, mse_cv_enet_mean(best_lambda_enet));
See Also
Categories
Find more on Model Building and Assessment in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!