Why is vectorizing way slower than the for-next loop it replaced?

8 views (last 30 days)

So wrote this program that iterates 7 variables, each nested in to the next, so my run times are from 5 to 45 minutes. Naturally, I want to speed this up, so I read all this about using vectors instead of for-next loops. Therefore, I replaced the following code:

for i=1:4
  if z(i)<100
    ea(i)=0;
    td(i)=tdo*z(i)/100;
  elseif z(i)<200
    ea(i)=0;
    td(i)=tdo;
  else
    ea(i)=eds(i)*(z(i)-200)/100;
    td(i)=tdo;
  end
  psifirst(i) = -(eds(i) - ea(i)) / td(i);
  ed(i) = (eds(i) + ea(i)) / 2;
end

with:

ea(z<200 | z==200)=0;
ea(z>200)=eds(z>200).*(z(z>200)-200)/100;
td(z>100)=tdo;
td(z<100 | z==100)=tdo*z(z<100)/100;
psifirst = -(eds - ea)./ td;
ed = (eds + ea) / 2;

This little piece of code is located in a function, that is called from another function, and so on for 7 nested loops. The results of the two version are identical, but the vectorized version is way slower. By the time the whole program converges, this portion will have run through 153 million times, so any improvement is a big improvement. But, my original version with for next loops takes about 5 minutes, the vectorized version took over 45 minutes. What did I do wrong, or how can I make this faster than the for-next loop version?

Thank you so much!

Erin

  1 Comment
dpb
dpb on 1 Oct 2018
You're recomputing the logical tests multiple times for one thing. I count nine (9) times overall and as many as three (3) of the same result.
Sometimes for loops are just as fast or faster; in the end the compile code has to come down to looping construct for actual execution.
I would think the optimizer would detect those in the same line, not sure it can/will do so over the full function.
iz=(z<=200);
ea(iz)=0;
ea(~iz)=eds(~iz).*(z(~iz)-200)/100;
iz=(z<=100);
td(~iz)=tdo;
td(iz)=tdo*z(iz)/100;
Don't know if will be appreciably faster or not...

Sign in to comment.

Accepted Answer

Matt J
Matt J on 1 Oct 2018
Edited: Matt J on 1 Oct 2018
The for-loop you've shown is only 4 iterations long (i=1:4). I suspect that's because you abbreviated the loop for testing purposes and forgot to revert back when comparing to the vectorized version. If I'm correct, you won't see an accurate performance comparison until you use the true loop length. If it really is a 4-step loop, then there really isn't enough computation there to significantly improve upon. You need to look at the rest of the code for opportunities for speed-up.

More Answers (1)

Erin Pratt
Erin Pratt on 1 Oct 2018
dpb and Matt, thank very much for your responses.
Matt, the index is only for four, I= 1 to 4 is correct. So for my case, do you believe the for-next loop is indeed the fastest way to go?
dpb, just for giggles I will try your suggestion, and see how it runs.
Thanks so much!
Erin
  2 Comments
dpb
dpb on 1 Oct 2018
Indeed; had noted the short vector length but as Matt assumed above, I also presumed that was just a short demo, not the real case.
There clearly the overhead in the logical addressing outweighs the looping construct; not terribly surprised it's no faster, curious as to where the OH is that would be so much slower as you've indicated.
What you might want to do is to look at the higher level in conjunction with this routine and see if you can "inline" more code there--in tightly-nested loops, that function call overhead may also be significant fraction of the run time; again when the size of a loop is so small.
Another alternative might be to use nested functions rather than full-fledged functions such that you can have the factorization but still address the variables as local--

Sign in to comment.

Categories

Find more on Get Started with MATLAB in Help Center and File Exchange

Products


Release

R2016b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!