Info

This question is closed. Reopen it to edit or answer.

Matrix multiplication bug in GPU

1 view (last 30 days)
Nikos Pitsianis
Nikos Pitsianis on 1 Jul 2014
Closed: MATLAB Answer Bot on 20 Aug 2021
I am using 8.2.0.701 (R2013b) on a host with 64 AMD cores and 2 K20c GPUs. Driver version 331.62 on Ubuntu 12.04.4 LTS.
$ uname -a
Linux leibniz3 3.5.0-44-generic #67~precise1-Ubuntu SMP Wed Nov 13 16:16:57 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
The matrix multiplication on the GPU returns results that differ substantially from the CPU for matrices of size 2^13x2^13.
To replicate, simply run
clear
n = 2^13;
A = rand(n);
B = rand(n);
tic
C = A * B;
t = toc; fprintf('CPU time %f sec\n',t)
%%One GPU
gpuDevice(1); % reset device
tic;
Ag = gpuArray(A);
Bg = gpuArray(B);
C1 = gather(Ag * Bg);
t = toc; fprintf('1 GPU time %f sec\n',t)
%%Two GPUs
gpuDevice(1); % reset device
gpuDevice(2); % reset device
tic
cc = cell(2,1);
parfor i = 1:2
dev = gpuDevice;
% fprintf('Iter %d Device %d\n',i,dev.Index);
Ag = gpuArray(A);
Bg = gpuArray(B(:,(i-1)*n/2+1:i*n/2));
cc{i} = gather(Ag * Bg);
end
C2 = [cc{1} cc{2}];
t = toc; fprintf('2 GPU time %f sec\n',t)
fprintf('n = %5d %f %f\n', n, ...
max(max(abs(C - C1))), max(max(abs(C - C2))))
The error is substantial. Is this known behavior?
The code works for smaller powers of two. 2^13 is the first that causes the bug to show its ugly head. I did not check other values but I will be glad to.
With 1 GPU the difference max(max(abs(C - C1))) is 0.999716 With 2 GPUs the difference max(max(abs(C - C2))) is 134.766785
The difference is very large!
Here are the plots. The second is a zoom, cause due to size the difference was invisible because it seems it is along a boundary.
<<
>>
I will try your suggestions and follow back on this.
  3 Comments
Edric Ellis
Edric Ellis on 2 Jul 2014
I can't reproduce the problem you're seeing in R2013b - but I have only a single K20c. Can you reproduce the problem using only a single GPU? Which OS are you using? Have you updated to the latest NVIDIA CUDA driver? Are you able to try R2014a (this includes a later version of the CUDA runtime libraries)?
Jill Reese
Jill Reese on 8 Jul 2014
I am also unable to reproduce this on a single K20c in R2013b. I'm running a 12 core Debian machine with GPU driver version 331.62. On my system I see reasonable agreement between the CPU and GPU results:
max(max(abs(C-C1))) = 10^(-11)
As Edric mentioned, are you able to try R2014a to see if the problem is still reproducible for you in that version?

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!