Backslash with triangular matrices

3 views (last 30 days)
Herwig
Herwig on 13 Dec 2013
Commented: Matt J on 20 Dec 2013
I'm using a LU decomposition on a matrix A (square, sparse, complex, symmetric (not Hermitian)):
[M1,M2,P,Q,R] = lu(A);
I then apply the backslash operator to these resulting matrices to build a Schur complement:
T = Q*(M2\(M1\(P*(R\B))));
If B were a vector, this second line executes sufficiently fast. But when B is a square matrix, which it is in my case, this operation is very slow since Matlab internally loops through the columns of B (which is my conclusion after having done some tic/tocs). The second line of code runs on a single thread using Matlab 2013b on Win7 64bit which makes this even slower.
What is a good way of speeding this up? A parfor loop working on the columns of B didn't work for me in Matlab 2013b due to reasons discussed in http://www.mathworks.com.au/matlabcentral/answers/59250-error-with-parfor-and-big-matrices (despite Edric Ellis saying it had been fixed in 2013a). Anything else I can try?
Thanks!
  2 Comments
Edric Ellis
Edric Ellis on 13 Dec 2013
Can you post reproduction steps that make this fail using PARFOR?
Herwig
Herwig on 17 Dec 2013
Hi Edric,
below is the code for which PARFOR fails. A is a large finite element matrix, which is not smaller than 5MB, hence I couldn't attach it. Csf is the right hand side for which size(Csf,2) > 2000.
[M1,M2,P,Q,R] = lu(A);
temp = P*(R\Csf);
temp2 = zeros(size(temp));
matlabpool 4
parfor iii=1:size(temp,2)
temp2(:,iii) = M2\(M1\temp(:,iii));
end
matlabpool close
temp = Q1*temp2;
I may be able to provide a dropbox link to the matrices A and Csf if that is helpful for you.
Herwig

Sign in to comment.

Accepted Answer

Matt J
Matt J on 17 Dec 2013
Edited: Matt J on 17 Dec 2013
Your operation looks equivalent to
T=A\B
I would expect that to be the fastest. Note that backslash already does its own lu decomposition internally, but with optimized multi-threading. You seem to be just repeating the decomposition work. Moreover, you've turned one backslash operation into three, possibly triggering three times as many internal lu decompositions.
  8 Comments
Herwig
Herwig on 20 Dec 2013
Hi Matt,
I'm testing my code on a local machine with 16GB RAM, but can move to our cluster with 120GB RAM per node. So, RAM will be available. I just want to make sure that I can fit as many degrees of freedom into my model as is possible. That's why I'm testing a small problem on my local machine and try to optimize this first.
I tried your suggestion. Solving the problem directly via Gaussian elimination A\B yields much worse performance in terms of memory and CPU time on my system. I think you need to give the explicit LU decomposition some more credit here ;) Gaussian elimination is not memory efficient but the crout type LU decomposition is! Also, remember that I want to solve the system more than once using different right hand sides B_1, B_2, ...
Anyway, your suggestion to use SPMD and distributed() has already helped me a lot. Maybe this is all Matlab can do at this point. Although, I'm still curious to learn whether memmapfile and SPMD can be made working together.
Matt J
Matt J on 20 Dec 2013
Although, I'm still curious to learn whether memmapfile and SPMD can be made working together.
Not sure why they wouldn't, but you might try MATFILE instead of MEMMAPFILE.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!