how to cross validate the data and use it for ensemble?

2 views (last 30 days)
I want to cross validate the train data and use it to construct an ensemble ....
Actually, I want to construct an ensemble (by using ''fitensemble'' function) on train data...... and then I have to test that ensemble on test data in order to evaluate the performance of that ensemble...... this is my basic task.... Thats why I have created separate files for train data and test data...
Now keep in mind only train data...
I want to cross validate the train data (10 folds) and then use it to construct an ensemble...... My code is as follows:
data_set = load('iris_train_data.txt');
data = data_set(:,end-1);
y = data_set(:, end); % labels////
cvpart = cvpartition(y,'kfold',10)
Now how to use that cvpart in fitensemble function given below??.... what should I do with it??
ens_cv = fitensemble(data,y,'AdaBoostM2',50,'tree','type','classification')...???? or how to access these folds to construct an ensemble??
I have tried the following:
ens_cv = fitensemble(data,y,'AdaBoostM2',50,'tree','type','classification', 'kfold',10)
but in this case I am unable to use this ensemble on test data as MATLAB 2011a doesn't allow me to do that... I can do only
Loss = kfoldLoss(ens)
But actually I want to compute the loss on test data (that I have in separate text file) and don't know how to pass that test data as kfoldLoss (in Matlab 2011a) doesn't allow this??
So that's why I am using cvpartition function but don't know how to use the cvpart data after partitioning and then use it to construct an ensemble and after that compute the loss of that ensemble on test data ???
plz suggest me....
thanxxx....
  1 Comment
Greg Heath
Greg Heath on 14 Jul 2012
Terminology:
Replace: I want to cross validate the train data (10 folds) and then use it to construct an ensemble......
With: I want to partition (or divide)the train data (10 folds) and then use it to construct an ensemble......

Sign in to comment.

Accepted Answer

Ilya
Ilya on 13 Jun 2012
Computing loss on test data and computing loss by cross-validation are two separate tasks. To compute loss on test data, you need to train an ensemble using all training data you have. To compute loss by say 10-fold validation, you need to grow 10 ensembles, each on 9/10 of your training data and then average loss over the left-out 1/10 parts. I am not sure why you expect that both tasks would be handled by one object. Even if they were handled by one object, it would not buy you anything in terms of CPU time or memory. Just do two separate things: 1) Grow an ensemble on all training data and use it to compute the test loss, and 2) Cross-validate this ensemble using its crossval method and use the kfoldLoss method of the partitioned ensemble (new object) to compute the cross-validated loss.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!