Sum of squares profiling on GPU

3 views (last 30 days)
Dan Ryan
Dan Ryan on 5 Oct 2013
Commented: Dan Ryan on 7 Oct 2013
I was profiling some code that runs on my GPU and came across something rather puzzling that I haven't been able to sort out... maybe it has something to do with the way the profiler interacts with the GPU, so I also tried on the CPU and got very different results. Here is the code:
clear all
g = gpuArray.rand(600, 600, 400, 'single');
for i = 1:100
x = sum(g, 3)/400;
gSq = g.^2;
y = sum(gSq, 3)/400;
g = g+.01;
end
This code is just an example of the problem, not the actual code I am running, so don't try to wonder why anybody would do this...
On the GPU the profiler shows basically ALL of the time is spent on the line
y = sum(gSq, 3)/400;
On the CPU, the profiler shows most of the time being spent on
g = g+.01;
and the remainder of the time is evenly distributed among the other lines.
Why is summing the gSq array so expensive on the GPU relative to summing the x array? They are the same size... I don't think it is a memory issue since my GPU has 4GB memory and almost 3GB is still available with g, x, gSq and y in memory.
Any ideas?
  3 Comments
Dan Ryan
Dan Ryan on 6 Oct 2013
The code above is the entire script. However, the original source of the problem is in a function file.
If I change the order so that the gSq sum is computed before the g sum the profiling results stay the same.
Dan Ryan
Dan Ryan on 7 Oct 2013
Upon further investigation, I can conclude that the profiler does not actually assign credit to each line in a correct manner when dealing with the GPU. For instance, if I run
g=gpuArray.rand(600, 600, 400, 'single');
for i = 1:1000
gSq = g.^2;
g = g+.01;
end
The whole script terminates in about 16 seconds and almost all of the time is assigned to the line gSq = g.^2;
However, after adding the line where the sum is computed:
g=gpuArray.rand(600, 600, 400, 'single');
for i = 1:1000
gSq = g.^2;
x = sum(gSq, 3);
g = g+.01;
end
The script now takes 40 seconds to run and only about 0.5 seconds in total is assigned to the line gSq = g.^2. This indicates that appropriate credit is not assigned to each line.
Secondly, using the squaring operation, .^2, takes two to three times as much time as explicitly multiplying the quantity by itself. Changing the line
gSq = g.^2;
to
gSq = g.*g;
results in a script that runs in about 5 seconds without the sum and 20 seconds with the sum; indicating about 10 seconds are saved in computing gSq and another 10 seconds are saved when computing sum(gSq, 3)... very strange.

Sign in to comment.

Answers (1)

Sean de Wolski
Sean de Wolski on 7 Oct 2013
You might be interested in gputimeit, new in R2013b:

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!