About parallel computation and inter process communication

3 views (last 30 days)
Hello all!
There is a piece of code that deals with finding patterns in sequences of strings of varying length. Nothing overly complex - except that the main code includes three loops. Anyway - the basic premise is as follows:
  1. Load the entire data set (essentially as a cell array) consisting of rows of these sequences.
  2. Run the main code
  3. Write the output to a file.
Sequentially this process when running without any parallel directives takes "x" seconds.
Now: if I change this to:
  1. Load the entire data set
  2. Start matlabpool
  3. invoke spmd(n)
  4. Run the main code.
  5. Write the output to file.
The run time is approximately "10x"!!
The machine on which this is being run: 12GB RAM, i7 with 6cores etc. etc.
From my understanding, upon invoking spmd (since I just am interested in letting different workers perform the same job on different sets of data), the total data set is automatically divided. So - logically the run time should decrease.
However, while trying to figure this out: I also divided the data set into process specific files which are loaded based on respective "labindex". That also - did not provide any relief nor answers.
I have some background with MPI and F90 so I am assuming that the significantly increased run time with more than one worker is probably due to inter-process communication. If that is so: is there any way to prevent this?
The problem I am trying to solve is a disjointed one. One set of data has no bearing on the other - so there is no real need for one worker to talk to another.
Any insight would be greatly appreciated. This really has me intrigued.
Cheers!

Answers (1)

Edric Ellis
Edric Ellis on 14 Jul 2014
What sort of data are you passing into SPMD? Inside SPMD, only distributed arrays are automatically operated on in parallel. For example:
x = rand(5000);
xd = distributed.rand(5000);
spmd
x = x * x; % all workers operate on their own total copy of 'x'
xd = xd * xd; % each worker has a slice of 'xd', and they collaborate
end
  3 Comments
Edric Ellis
Edric Ellis on 15 Jul 2014
Edited: Edric Ellis on 15 Jul 2014
Unless you need the (MPI-style) communication available within SPMD, you might be better off using PARFOR which can automatically divide up your problem. For example:
% build 'c' which is a 50x1 cell array where each cell is 100x100
c = mat2cell(rand(5000, 100), 100 * ones(50,1), 100);
% operate on 'c' in parallel
parfor idx = 1:numel(c)
out{idx} = max(abs(eig(c{idx})));
end
The key to getting PARFOR working in this case is that you index into your cell array ("c" in the above example) using the loop variable - this ensures the data is 'sliced', and therefore can be operated on efficiently in parallel.
Ash
Ash on 15 Jul 2014
I had looked at parfor earlier. However, let me make some changes to the code, and get back with my findings. I really appreciate your inputs. Thanks...

Sign in to comment.

Categories

Find more on MATLAB Parallel Server in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!