Using kernels in for-loop: computation time of GPU scales linearly with iterations

1 view (last 30 days)
I've got an algorithm in MATLAB which is based on a for-loop of time steps as follows:
for cnt=1:cnt_max
do calculation based on measurement data and result of previous time step
end
If I now use a gpuArray interface and arrayfun, then my computation time per iteration scales linearly with cnt. The same happens if I write the functions in CUDA and make kernels in MATLAB to do the calculations using feval:
Make ten different Kernels with parallel.gpu.CUDAKernel
Set their gridSize and ThreadBlockSize
Initialize result variables as gpuArrays
for cnt=1:cnt_max
tic;
data_cnt = gpuArray(data_cnt) %data is stored in matrix on CPU
result1_cnt=feval(myKernel1,result_cnt1,input)
(...)
result10_cnt=feval(myKernel10,result_cnt10,input)
wait(gpuDevice);toc;
end
I really have no clue why my computation time is getting bigger and bigger. I neither create variables inside the loop nor do I change their size. I am not used to GPU computing and CUDA, so I don't know what to do. I use MATLAB R2013b, the parallel computing toolbox and GPU "Tesla K20c".
  5 Comments
Silvia
Silvia on 20 Jan 2014
As you can see in the attached plot for the previous version my computing time per iteration was linearly scaling up. With the modification x^2 -> x.^2 I didn't have this problem anymore.
Previous version:
for i=1:N
tic;
%calculate A and B on GPU
res=sqrt( (A-datalist(i).x(1,1))^2+(B-datalist(i).x(2,1))^2 );
wait(gpuDevice);
time_per_iteration=toc;
end;
Fixed version:
for i=1:N
tic;
%calculate A and B on GPU
res=sqrt( (A-datalist(i).x(1,1)).^2+(B-datalist(i).x(2,1)).^2 );
time_per_iteration=toc;
end;
where A, B are singleton gpuArray and datalist is stored on CPU
Silvia
Silvia on 22 Jan 2014
I am not sure if it is related, but I just discovered another strange behaviour of using ^2 on gpuArray.
A is a negative gpuArray singleton: A<0, imag(A)=0
B = A^2 -> imag(B)=0.0000e+00
B = A.^2 -> imag(B)=0
B = abs(A)^2 -> imag(B)=0
B = A*A -> imag(B)=0
So if I use ^2 on a negative singleton gpuArray, then the result gets an imaginary part. This part is in fact zero, but to MATLAB A is no real number anymore.

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!