I have been using the nchoosek function to find combinations of subset of elements, but for large numbers 40C20 its is very slow .

5 views (last 30 days)
v(1,:) = 1:5;
C = nchoosek(v,3);
C =
1 2 3
1 2 4
and so on.
I need to find for
v(1,:) = 1:40;
C = nchoosek(v,20);

Accepted Answer

Sindar
Sindar on 16 Jan 2020
Is it slow or frozen or error-ing?
There are 137846528820 (~1e11) possible combinations (40 choose 20), so you are asking for a 1e11 x 20 matrix. That's 20 TB of data. Even aside from calculations and fitting the answer anywhere, that's going to take forever. Are you sure this is what you need?
When I run these commands, Matlab returns an error:
"Requested 137846528820x20 (20540.7GB) array exceeds maximum array size preference."
  6 Comments
Sindar
Sindar on 17 Jan 2020
Edited: Sindar on 17 Jan 2020
In that case, you can probably find good-but-not-best combinations by random sampling with randperm:
% guess a high value of validity
goodness_thresh = 5;
% set up
num_checked = 0;
num_good_enough = 0;
best_goodness = 0;
% loop until you find 10 combinations that pass the validity threshold
% the optional second end condition breaks the loop if it runs too long
while (num_good_enough < 10) && (num_checked < 1e5)
% select a random combination (20 numbers from 1:40)
C = randperm(40,20);
% evaluate the goodness with your function
goodness = evaluate_validity(C);
% check validity against your threshold
if goodness > goodness_thresh
% record the combination
good_Cs(num_good_enough+1,:) = C;
num_good_enough = num_good_enough + 1;
% update the highest validity
best_goodness = max(best_goodness,goodness);
end
num_checked = num_checked + 1;
end
If you don't initially have a good idea of the validity values, run it for num_good_enough < 1000 and check best_goodness at the end, then set that as the goodness_thresh, and repeat until it seems like you're selecting good ones. You could even automate it:
goodness_thresh = 0;
for ind=1:10
% insert above code with (num_good_enough < 1000)
goodness_thresh = best_goodness;
end
Walter Roberson
Walter Roberson on 17 Jan 2020
If you need to find the 10 best subsets of 20 out of 40 points, then you do not need to store all of the possibilities: you can process in chunks, generating a subset and processing it, remembering the best, then go on to the next chunk.
However you still need to iterate through all 1e11 possibilities. How many can you process per second? You can generate some at random and do some timing tests.

Sign in to comment.

More Answers (1)

Maddie
Maddie on 17 Jan 2020
These are the functions that I used for calculating the combinations and fitness
function pbmval_sorted = pbmforchoosingk(X,M,klus,N,p)
[a,b] = size(M);
v(1,:) = 1:5;
C = nchoosek(v,3);
[r,c]=size(C);
C=C';
vC=C(:);
mat=reshape(vC,c,1,[]);
for i=1:r
pbmval(i)=evaluatepbm(X,klus,M(mat(:,:,i),:));
pbmval_in(i,:) = [pbmval(i) i];
end
pbmval_sorted = sortrows(pbmval_in,1,{'descend'});
//this function will call the fitness/validity function
function pbmval=evaluatepbm(X,K,cen)
n=size(X,1);
O1=ones(n,1);
dist=zeros(n,K);
% tic
for i=1:K
dist(:,i)=sum((bsxfun(@minus,X,O1*cen(i,:))).^2,2);
end
% toc
[~,label]=min(dist,[],2);
f0=zeros(n,K);
for i=1:K
f0(label==i,i)=1;
end
EK=sum(sum(sqrt(dist).*f0));
DK=max(pdist(cen));
E1=sum(sqrt(sum((bsxfun(@minus,X,O1*mean(X))).^2,2)));
pbmval=(1/K*(E1/EK)*DK)^2

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!