Understanding k-fold cross validation

3 views (last 30 days)
rex Macey
rex Macey on 11 Apr 2014
Let's say we're doing a logistic regression with 10 fold cross-validation with lasso regularization. We have 200 examples (training observations) I want to understand the steps because I'm fuzzy. I'm pretty sure I get the first part about dividing the entire data into 10 sets with 180 observations as training data and the other 20 as test data. Each observation is used 1 time as test data and 9 times as training. Let's assume the algorithm is going to choose 60 lambdas (I don't really care about that detail, but that seems to be what some routines do). Are the following steps correct? Does the algo performs 60 regressions (one for each lambda) on each of the 10 folds. For each lambda we have 10 regression equations. THE QUESTION REALLY BUGGING ME IS: to calculate the coefficients (for each lambda), does it just average the respective coefficients from the 10 regressions (60 averages, one for each lambda)? Then does it calculate a mean and standard deviation of the cost function? My understanding is that it finds the lambda with the lowest mean cost (error). Then there is a final step of finding the highest lambda whose mean error is within 1 standard deviation of the error associated with the minimum lambda. My main concern is that I understand how it comes up with the coefficients. I'm worried I'm missing something because it doesn't make sense to me that one might use a 3 fold CV because 3 is a small number to average and to calculate a standard deviation. I know this is a long description. If you've read this far, thanks and I appreciate any reply.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!