Vectorize and use parfor in a nested loop with five levels

Question

Fernando on 28 Jun 2013

2
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/80570-vectorize-and-use-parfor-in-a-nested-loop-with-five-levels

Hi all,

I'm working on an application that, for now, requires computing a nested loop with five levels. I'm trying to vectorize some parts and to use a parfor for one of the loops. However, I haven't been very successful. Any suggestions would be great. A simplified version of the code looks as follows,

clear all
clc
p1=(1:1:50)';
p2=p1;
np=max(size(p1));
[P1,P2]=meshgrid(p1,p2);
DP=unique(P1-P2);
K=max(size(DP));
ns=1000;
ne=10;
S1=randn(ns,ne);
S2=randn(ns,ne);
S3=randn(ns,ne);
S4=randn(ns,ne);
L1=rand(ns,ne);
L2=rand(ns,ne);
L3=rand(ns,ne);
L4=rand(ns,ne);
a=2;
BR=zeros(np,K,K,K,K);
P=BR;
for i=1:K,
    DP1=DP(i);
    C1=mean(a*S1*DP1>=0,2);
    e1=bsxfun(@times,L1,(1-2*C1));
    for j=1:K,
        DP2=DP(j);
        C2=mean(a*S2*DP2>=0,2);
        e2=bsxfun(@times,L2,(1-2*C2));
        for l=1:K,
            DP3=DP(l);
            C3=mean(a*S3*DP3>=0,2);
            e3=bsxfun(@times,L3,(1-2*C3));
            for m=1:K,
                DP4=DP(m);
                C4=mean(a*S4*DP4>=0,2);
                e4=bsxfun(@times,L4,(1-2*C4));
                  for pp=1:np,
                      pr=p2(pp);
                      dp1=p1-pr;
                      A=reshape(-bsxfun(@times,a.*C4,dp1'),ns,1,np);
                      N=bsxfun(@minus,A,e4);
                      f=exp(N);
                      PR=f./(1+f);
                      rev=zeros(ns,ne,np);
                      for zz=1:np,
                          rev(:,:,zz)=PR(:,:,zz)*p1(zz);
                      end
                      er=reshape(sum(mean(rev,2)),np,1);
                      [mpr,ind]=max(er);
                      BR(pp,m,l,j,i)=p1(ind);
                      P(pp,m,l,j,i)=mpr;
                  end
              end
          end
      end
  end

In this version the problems start when evaluating BR and P in the inner loop. Even though all pp, m, l, j and i are defined, it takes a lot of memory and time to evaluate BR(pp,m,l,j,i)=p1(ind), and I really don't know why. Then, after solving that, I need to improve on vectorizing what might be possible to vectorize, and choose a loop to use the parfor.

Thanks,

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Jan on 28 Jun 2013

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/80570-vectorize-and-use-parfor-in-a-nested-loop-with-five-levels#answer_90272

Edited: Jan on 30 Jun 2013

Open in MATLAB Online

Some small improvements:

% Original code:
rev=zeros(ns,ne,np);
for zz=1:np,
  rev(:,:,zz)=PR(:,:,zz)*p1(zz);
end
% Vectorized (is this faster?):
rev = bsxfun(@times, PR, reshape(p1, 1, 1, np));

Calculating sum(mean(rev,2)) contains 2 summations. Perhaps one sum is faster:

er = sum(reshape(rev, np, []), 2) / ne;

Avoir some multiplications by -1:

% Original:
dp1 = p1-pr;
A   = reshape(-bsxfun(@times,a.*C4,dp1'),ns,1,np);
% New
dp1 = pr - p1;
A   = reshape(bsxfun(@times, a.*C4, dp1'), ns, 1, np);

In these lines:

C1 = mean(a*S1*DP1>=0,2);

some simplifications are possible: a * S1 * DP1 >= 0 does not depend on the positivbe constant a. DP1 is a scalar and only its sign is interesting. So the meanS1 = mean(S1>=0, 2) can be calculated once and in case of a negative sign of DP1 1 - meanS1 can be calculated fast. DP1==0 must be caught also.

I'd start a parallelization after these mathematical simplifications have been performed. Avoiding calculations is simply more efficient than trying to do them faster.

[EDITED] Now I've found the time to run your code on my computer. Beside the fact that 4GB are not enough to run your code (how much RAM do you have?), a simple test revealed the bottleneck: Use the profiler to find out, which lines consumes the most time. You will see, that all my ideas are meaningless, because they improve parts of your code, which require only one percent of the total time. Only vectorizing the creation of rev matters significantly. But the rest is dominated by the very expensive calculation of:

f = exp(N);

So this is the point where improvements are required at first. N is created by BSXFUN and you can exploit, that exp(a - b) equals exp(a) / exp(b), and that exp(a * b) equals exp(a) ^ b. Then the innermost loop becomes:

   exp_e4_inv = exp(e4);
   exp_a_C4   = exp(a * C4);
   for pp = 1:np
      dp1 = p2(pp) - p1;
      Ax  = reshape(bsxfun(@power,exp_a_C4, dp1'), ns, 1, np);
      f   = bsxfun(@times, Ax, exp_e4_inv);
      PR  = f./(1+f);
      rev = bsxfun(@times, PR, reshape(p1, 1, 1, np));
      er  = reshape(sum(reshape(rev, [], 3)) / ne, np, 1);
      [mpr,ind]=max(er);  % No changes here...
      BR(pp,m,l,j,i)=p1(ind);
      P(pp,m,l,j,i)=mpr;
   end

This reduces the total number of elements the exponential function is applied to. For the much smaller problem defined by p1=(1:3)', I get 1.45 seconds runtime instead of 3.02 of the original version. The results differ by rounding errors inside the expected limits from the original. So some basic mathematical simplifications gained the double speed.

Well, double speed is not impressing by such a large problem. But I hesitate to start to parallelize code without considering this. Now this is a good point to come back to the original question: How do we PARFOR this?

5 Comments
Show 3 older commentsHide 3 older comments

Jan on 30 Jun 2013

See [EDITED]

Fernando on 1 Jul 2013

Thanks Jan for the edition. I had realized that the costly part was computing the exponential but I had no idea how to improve on that. Once again, thanks! And to answer your question about RAM, I'm running this on a computer cluster so as long as I don't need more than 50gb, it should be OK.

Sign in to comment.

Vectorize and use parfor in a nested loop with five levels

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Vectorize and use parfor in a nested loop with five levels

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

5 Comments
Show 3 older commentsHide 3 older comments