I have been using the nchoosek function to find combinations of subset of elements, but for large numbers 40C20 its is very slow .
5 views (last 30 days)
Show older comments
v(1,:) = 1:5;
C = nchoosek(v,3);
C =
1 2 3
1 2 4
and so on.
I need to find for
v(1,:) = 1:40;
C = nchoosek(v,20);
0 Comments
Accepted Answer
Sindar
on 16 Jan 2020
Is it slow or frozen or error-ing?
There are 137846528820 (~1e11) possible combinations (40 choose 20), so you are asking for a 1e11 x 20 matrix. That's 20 TB of data. Even aside from calculations and fitting the answer anywhere, that's going to take forever. Are you sure this is what you need?
When I run these commands, Matlab returns an error:
"Requested 137846528820x20 (20540.7GB) array exceeds maximum array size preference."
6 Comments
Sindar
on 17 Jan 2020
Edited: Sindar
on 17 Jan 2020
In that case, you can probably find good-but-not-best combinations by random sampling with randperm:
% guess a high value of validity
goodness_thresh = 5;
% set up
num_checked = 0;
num_good_enough = 0;
best_goodness = 0;
% loop until you find 10 combinations that pass the validity threshold
% the optional second end condition breaks the loop if it runs too long
while (num_good_enough < 10) && (num_checked < 1e5)
% select a random combination (20 numbers from 1:40)
C = randperm(40,20);
% evaluate the goodness with your function
goodness = evaluate_validity(C);
% check validity against your threshold
if goodness > goodness_thresh
% record the combination
good_Cs(num_good_enough+1,:) = C;
num_good_enough = num_good_enough + 1;
% update the highest validity
best_goodness = max(best_goodness,goodness);
end
num_checked = num_checked + 1;
end
If you don't initially have a good idea of the validity values, run it for num_good_enough < 1000 and check best_goodness at the end, then set that as the goodness_thresh, and repeat until it seems like you're selecting good ones. You could even automate it:
goodness_thresh = 0;
for ind=1:10
% insert above code with (num_good_enough < 1000)
goodness_thresh = best_goodness;
end
Walter Roberson
on 17 Jan 2020
If you need to find the 10 best subsets of 20 out of 40 points, then you do not need to store all of the possibilities: you can process in chunks, generating a subset and processing it, remembering the best, then go on to the next chunk.
However you still need to iterate through all 1e11 possibilities. How many can you process per second? You can generate some at random and do some timing tests.
More Answers (1)
See Also
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!