Cannot figure out why in this code, GPU imdilate takes significantly longer than its CPU counterpart.

4 views (last 30 days)
Hi,
I am trying to execute a binary image dilation on the GPU, for a large image, and I need to repeat this operation several thousands of times, and save the results. I am not sure why it takes significantly longer (about 10x worse) to do that. Below is the sample code I executed, followed by the results obtained. Please advise if this is the correct behavior, and why. Also, if I only need to save the result immediately, do I need to "gather" it before saving it?
My machine is a dual-CPU, 10-core each, with 128GB RAM. My GPU is a Tesla K20c. I am using MATLAB 2013A.
Thanks.
% Sample code.
% ------------
useGPU = 1;
if useGPU
gpu = gpuDevice(1);
reset(gpu);
end
boston = geotiffread('boston.tif');
edgeImage = edge(rgb2gray(boston), 'canny');
% Creating a large image.
bwImage = repmat(edgeImage, 3, 2);
for i = 1 : 10
% Create mockup binary data.
mask = rand(501) > 0.99;
if useGPU
maskOnGPU = gpuArray(mask);
bwImageOnGPU = gpuArray(bwImage);
t1 = clock;
bwResultOnGPU = imdilate(bwImageOnGPU, maskOnGPU, 'same');
wait(gpu);
t2 = clock;
bwResult = gather(bwResultOnGPU);
duration = sprintf('%2.2f', etime(t2, t1));
msg = strcat(['GPU Iteration #', int2str(i),': ''imdilate'' took ', duration, ' seconds']);
else
t1 = clock;
bwResult = imdilate(bwImage, mask, 'same');
t2 = clock;
duration = sprintf('%2.2f', etime(t2, t1));
msg = strcat(['CPU Iteration #', int2str(i),': ''imdilate'' took ', duration, ' seconds']);
end
disp(msg);
[~, fileName] = fileparts(tempname); % Mockup file name
save(fileName, 'bwResult', '-v7.3');
end
% CPU Iteration #1: 'imdilate' took 24.79 seconds
% CPU Iteration #2: 'imdilate' took 27.06 seconds
% CPU Iteration #3: 'imdilate' took 31.47 seconds
% CPU Iteration #4: 'imdilate' took 29.31 seconds
% CPU Iteration #5: 'imdilate' took 32.48 seconds
% CPU Iteration #6: 'imdilate' took 32.05 seconds
% CPU Iteration #7: 'imdilate' took 32.79 seconds
% CPU Iteration #8: 'imdilate' took 31.57 seconds
% CPU Iteration #9: 'imdilate' took 32.72 seconds
% CPU Iteration #10: 'imdilate' took 27.76 seconds
%
% GPU Iteration #1: 'imdilate' took 253.57 seconds
% GPU Iteration #2: 'imdilate' took 256.04 seconds
% GPU Iteration #3: 'imdilate' took 258.49 seconds
% GPU Iteration #4: 'imdilate' took 255.58 seconds
% GPU Iteration #5: 'imdilate' took 257.73 seconds
% GPU Iteration #6: 'imdilate' took 252.24 seconds
% GPU Iteration #7: 'imdilate' took 260.87 seconds
% GPU Iteration #8: 'imdilate' took 254.85 seconds
% GPU Iteration #9: 'imdilate' took 254.65 seconds
% GPU Iteration #10: 'imdilate' took 251.99 seconds

Accepted Answer

Anand
Anand on 25 Nov 2013
Hello Alex,
The reason you are noticing a slowdown for the GPU in comparison to the CPU is that the CPU implementation is highly optimized to deal with binary dilation with sparsely populated structuring elements (mask). On the other hand, the GPU implementation does not have this algorithmic optimization.
A sparse mask is one with a large percentage of 0’s in it. The mask you set up will have somewhere around 99% of its elements being zero, which means it is highly sparse. I measured execution time on a non-sparse structuring element (for example, mask = rand(501)>.3) and found the CPU implementation to be a lot slower while the GPU implementation stayed at around the same execution time.
So, what you are noticing is that the CPU implementation is very fast (especially given the specifications of the machine you are using), more so than the GPU implementation being slow.
In either case, this is something we will look into and may consider addressing in a future release of the product. Thank you for reporting this to us.
Anand Raja, Image Processing Team

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!