MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

# Can't get speed up !

Asked by john on 1 Dec 2012

Hi,

I've just learned about Matlab Parallel Computing Toolbox. I'm studying about it. In the beginning and for taking motivation i tried some simple codes to get speed up. But all of my parallel results were worse than serial ones. I know about overhead of data communication between cores. So i wrote a code that have least data communication but likewise before the parallel execution time was longer than serial. I execute parallel code on two workers. My CPU is Intel Core 2 Duo 2.26 Ghz. CPU usage is 100% while running parallel code and 50% while running serial code.

I also tried a code that i found in net. The writer had claimed speed of for the code is 1.92 using 2 workers. But i got 0.96 !

I'm so disturbed!

Walter Roberson on 1 Dec 2012

If you have hyperthreading enabled, turn it off.

john on 2 Dec 2012

I checked the BIOS and didn't find CPU settings and hyperthreading. I think the CPU (Core 2 Duo) doesn't have hyperthreading feature.

## Products

Answer by john on 2 Dec 2012

This is my serial code :

```clear
A = zeros(2,4000000);
tic
for j = 1:2
for k = 1:4000000
A(j,k) = sin(j + k);
end
end
toc
```

And the parallel one :

```clear
A = zeros(2,4000000);
tic
parfor j = 1:2
for k = 1:4000000
A(j,k) = sin(j + k);
end
end
toc
```

Very simple! The parfor has just two iterations and i expect that each of the iterations is executed by one core and get speed up about 2. But run time of the first is 4 seconds and the second is 14!

Walter Roberson on 2 Dec 2012

Try reversing the order of the subscripts, producing a 4 million by 2 output, so that there would not be any cache-line contention. Also, try vectorizing, e.g.,

```A(j, :) = sin(j + (1:4000000));
```

with no "for k" loop.

Jan Simon on 29 Dec 2012

Slightly fast: sin(j + 1:j + 4000000)

Answer by john on 2 Dec 2012
Edited by john on 2 Dec 2012

Thanks! I tried this code

```A(j, :) = sin(j + (1:4000000));
```

and got speed up about 1 ! This wasn't disappointing like before samples. Then i tried weighting each iteration

```A(j, :) = sin(j + (1:4000000)) .* sin(j - (1:4000000)) ...
.* cos(j - (1:4000000)) .* cos(j + (1:4000000));
```

and speed up was about 1.3!

Can you please explain what effect the code that you mentioned has? (A(j , :) = ...) why was my first code bad?

Regards

Brad Stiritz on 29 Dec 2012

OK, thanks Walter for the background info. I'm developing strictly in MATLAB though, so unfortunately I won't be able to check out those compilers. I'll do some further digging & maybe post a new Question specifically about this, or a Service Request. Will post back to this thread if I come up with anything.

Walter Roberson on 29 Dec 2012

Those machines haven't been sold for a number of years. And MATLAB has not been supported on them for a fair number of releases.

Brad Stiritz on 22 Jan 2013

Regarding Walter's 12/2/2012 comment, I requested documentation reference from Mathworks support & received this reply:

------------------------------------------------

"While working with MATLAB in general and also with PCT, row-wise access in same column will be faster than column-wise access in same row. This is because, in MATLAB, matrix elements belonging to the same column are located in consecutive locations of memory, while elements belonging to the same row of the matrix are located in non-consecutive locations of memory.

The following link from the Mathworks website highlights the above mentioned fact and also provides additional information on "Speeding up MATLAB Applications"

http://www.mathworks.com/tagteam/70598_91991v00_MATLABApps_WhitePaper.pdf

Also, please see the following MATLAB Documentation link for additional details on profiling and improving parallel code:

http://www.mathworks.com/help/distcomp/profiling-and-improving-parallel-code.html

Unfortunately, we cannot recommend any books. However, it would be highly recommended to go through the webinars mentioned in the link below:

http://www.mathworks.com/products/parallel-computing/webinars.html#

------------------------------------------------

Hope this helps..