Problem running a cvpartition with a tall array

1 view (last 30 days)
MATLAB's documentation indicates that cvpartition function is support for tall arrays, as long as it uses a stratified holdout partition. Therefore, this should work, when "group" is a Mx1 double column vector pulled out of a datastore:
myPartition = cvpartition(group, 'Holdout',.25);
A = gather(test(myPartition));
It yields the proper logical vector if I load the "group" array into memory. But as a tall array, I instead get this error:
Error using internal.stats.bigdata.cvpartitionTallImpl (line 92)
P is too small to have a non-empty test set.
There gather operation is not the issue here; this is the first command applied to the tall array after it is created.
I think I've tracked down the cause of that error to a this bit of code in the cvpartitionInMemoryImpl class:
if (isempty(cv.Group) && floor(cv.N *T) == 0) ||...
(~isempty(cv.Group) && floor(length(cv.Group) * T) == 0)
error(message('stats:cvpartition:PTooSmall'));
end
Where T is (at least supposed to be) the 0.25 probability value.
Is there a way around this error? I'm working with some very, very large data files and would like to take advantage of the tall array functionality wherever possible.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!