Iterative solver with gpuArray
7 views (last 30 days)
Show older comments
Hi all,
In some cases the use of iterative solvers is useful also with full matrices, which is my case. I would like to use an iterative solver like GMRES with full matrices where the matrix and the RHS are gpuArrays, but it looks like this is not provided with Matlab 2013a.
My data are
>> n = 1024;
>> Acpu = rand(n)+100*eye(n);
>> bcpu = rand(n,1);
>> Agpu = gpuArray(Acpu); bgpu = gpuArray(bcpu);
I tried either
>> x = gmres(Agpu,bgpu,[]);
Error using iterchk (line 39)
Argument must be a floating point matrix or a function handle.
Error in gmres (line 86)
[atype,afun,afcnstr] = iterchk(A);
and
>> x = gmres(@(x)(Agpu*x),bgpu,[]);
The following error occurred converting from gpuArray to double:
Conversion to double from gpuArray is not possible
Error in gmres (line 297)
U(:,1) = u;
The only way I found to make it work is
>> x = gmres(@(x)gather(Agpu*x),bcpu,[]);
gmres converged at iteration 7 to a solution with relative residual 2.4e-07.
That is terribly ugly because the matrix-vector-product is continuously swapped from GPU to the system memory. Any suggestion to use GMRES on GPU using MATLAB built-in functions?
Thanks in advance Fabio
2 Comments
Matt J
on 16 Sep 2014
Are you saying you get no acceleration over CPU-gmres? I wouldn't expect the data transfer of Agpu*x to be such a big penalty. It's not like you're transfering all of Agpu, after all.
I also vaguely wonder whether this would continue to be a problem on newer graphics cards and newer versions of CUDA. My understanding was that the newer CUDA versions could share memory with the CPU.
Accepted Answer
Matt J
on 16 Sep 2014
Edited: Matt J
on 16 Sep 2014
Even for much larger problem sizes (n=10240) and a not so new graphics card (GTX 580), I see negligible overhead in time to swap between CPU and GPU,
n = 1024*10;
Acpu = rand(n)+100*eye(n);
bcpu = rand(n,1);
Agpu = gpuArray(Acpu);
bgpu= gpuArray(bcpu);
gputimeit(@() Agpu*bgpu) %all data on gpu
%0.0052sec
gputimeit(@() gather( Agpu*bcpu )) %requires data transfer
%0.0054sec
Speed-up in GMRES also seems pretty good (factor of 4)
tic;
x = gmres(@(x) Acpu*x,bcpu,[]);
toc
%Elapsed time is 0.391786 seconds.
tic;
x = gmres(@(x)gather(Agpu*x),bcpu,[]);
toc
%Elapsed time is 0.097924 seconds.
5 Comments
Matt J
on 16 Sep 2014
Edited: Matt J
on 16 Sep 2014
If you must use tic...toc, the following would be a better set of tests
tic;
x=gather( Agpu*bcpu );x(:)=1;
toc %requires data transfer
tic; for ii=1:10,
x= Agpu*bgpu;
end;
x=gather(x);
x(:)=1;
toc/10 %all data on gpu
tic; x= Acpu*bcpu;x(:)=1; toc
Notice that the second test is the most realistic representation of what you would like to do, i.e., many iterations of GPU operations plus a final gather() operation at the end of the iterations.
More Answers (1)
Joss Knight
on 7 Sep 2015
If you download the R2015b release of MATLAB (released on 3rd September) you will find that gmres is now supported for sparse gpuArrays, including support for a single sparse matrix preconditioner. See http://www.mathworks.com/help/distcomp/release-notes.html.
0 Comments
See Also
Categories
Find more on GPU Computing in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!