Iterative solver with gpuArray

7 views (last 30 days)
Fabio Freschi
Fabio Freschi on 16 Sep 2014
Answered: Joss Knight on 7 Sep 2015
Hi all,
In some cases the use of iterative solvers is useful also with full matrices, which is my case. I would like to use an iterative solver like GMRES with full matrices where the matrix and the RHS are gpuArrays, but it looks like this is not provided with Matlab 2013a.
My data are
>> n = 1024;
>> Acpu = rand(n)+100*eye(n);
>> bcpu = rand(n,1);
>> Agpu = gpuArray(Acpu); bgpu = gpuArray(bcpu);
I tried either
>> x = gmres(Agpu,bgpu,[]);
Error using iterchk (line 39)
Argument must be a floating point matrix or a function handle.
Error in gmres (line 86)
[atype,afun,afcnstr] = iterchk(A);
and
>> x = gmres(@(x)(Agpu*x),bgpu,[]);
The following error occurred converting from gpuArray to double:
Conversion to double from gpuArray is not possible
Error in gmres (line 297)
U(:,1) = u;
The only way I found to make it work is
>> x = gmres(@(x)gather(Agpu*x),bcpu,[]);
gmres converged at iteration 7 to a solution with relative residual 2.4e-07.
That is terribly ugly because the matrix-vector-product is continuously swapped from GPU to the system memory. Any suggestion to use GMRES on GPU using MATLAB built-in functions?
Thanks in advance Fabio
  2 Comments
Matt J
Matt J on 16 Sep 2014
Are you saying you get no acceleration over CPU-gmres? I wouldn't expect the data transfer of Agpu*x to be such a big penalty. It's not like you're transfering all of Agpu, after all.
I also vaguely wonder whether this would continue to be a problem on newer graphics cards and newer versions of CUDA. My understanding was that the newer CUDA versions could share memory with the CPU.
Fabio Freschi
Fabio Freschi on 16 Sep 2014
Not yet implemented in Matlab 2013a. I get out-of-memory pretty soon if I exceed the GPU memory (12GB in my case, with Tesla K40)

Sign in to comment.

Accepted Answer

Matt J
Matt J on 16 Sep 2014
Edited: Matt J on 16 Sep 2014
Even for much larger problem sizes (n=10240) and a not so new graphics card (GTX 580), I see negligible overhead in time to swap between CPU and GPU,
n = 1024*10;
Acpu = rand(n)+100*eye(n);
bcpu = rand(n,1);
Agpu = gpuArray(Acpu);
bgpu= gpuArray(bcpu);
gputimeit(@() Agpu*bgpu) %all data on gpu
%0.0052sec
gputimeit(@() gather( Agpu*bcpu )) %requires data transfer
%0.0054sec
Speed-up in GMRES also seems pretty good (factor of 4)
tic;
x = gmres(@(x) Acpu*x,bcpu,[]);
toc
%Elapsed time is 0.391786 seconds.
tic;
x = gmres(@(x)gather(Agpu*x),bcpu,[]);
toc
%Elapsed time is 0.097924 seconds.
  5 Comments
Matt J
Matt J on 16 Sep 2014
Edited: Matt J on 16 Sep 2014
If you must use tic...toc, the following would be a better set of tests
tic;
x=gather( Agpu*bcpu );x(:)=1;
toc %requires data transfer
tic; for ii=1:10,
x= Agpu*bgpu;
end;
x=gather(x);
x(:)=1;
toc/10 %all data on gpu
tic; x= Acpu*bcpu;x(:)=1; toc
Notice that the second test is the most realistic representation of what you would like to do, i.e., many iterations of GPU operations plus a final gather() operation at the end of the iterations.
Fabio Freschi
Fabio Freschi on 16 Sep 2014
Edited: Fabio Freschi on 16 Sep 2014
Following a suggestion found in the Mathworks website:
>> gd = gpuDevice;
>> tic; for i = 1:100, x = Agpu*bgpu; end; wait(gd); toc
Elapsed time is 0.537721 seconds.
>> tic; for i = 1:100, x = gather(Agpu*bgpu); end; wait(gd); toc
Elapsed time is 0.547418 seconds.
That are in accordance with your experiments
EDIT: I see now your comment that is similar with this implementation

Sign in to comment.

More Answers (1)

Joss Knight
Joss Knight on 7 Sep 2015
If you download the R2015b release of MATLAB (released on 3rd September) you will find that gmres is now supported for sparse gpuArrays, including support for a single sparse matrix preconditioner. See http://www.mathworks.com/help/distcomp/release-notes.html.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!