SLURM Distributed Computing error

3 views (last 30 days)
Sanjay
Sanjay on 7 Dec 2013
Answered: Raymond Norris on 13 Dec 2013
Hi,
I have using a matlab script to distribute tasks through SLURM. However, I keep on getting the following error in the script(SubmitQueuedJobs.m) I wrote for SGE earlier but I am adapting for SLURM now. I don't understand what is happening. Could you please help me.
"Error in SubmitQueuedJobs (line 76) job{j} = createJob(j); % createSgeJob(0);"
Here's the code
if true
function JobOut = SubmitQueuedJobs(nJobs,JobFcnHandle,nJobArgOut,JobOptions,...
ListPathDependencies,nMaxWorkers,ShowOutput,LogFileName,OutputPreamble)
% SubmitQueuedJobs
%
% Submit parallel jobs that might exceed the maximum number of workers
% available.
%
%
%
% Mandatory Inputs:
%
% nJobs
% Number of Jobs to run
%
% JobFcnHandle
% Handle to the function called by each worker.
%
% nJobArgOut
% Number of output arguments in each job.
%
% JobOptions
% Cell array with options required by the function called by the worker.
% It needs to contain one element per job, unless no inputs are
% necessary, in which case, {} needs to be used.
%
% ListPathDependencies
% Cell array with the list of path dependencies needed in function called
% by workers.
%
% nMaxWorkers
% [Optional] Maximum number of simultaneous workers. Default: 4.
%
% ShowOutput
% If set to 1 all output of each worker is shown. If LogFileName is not
% specified, then it is shown in the caller command history. If
% LogFileName is specified then it is stored in log.
% Default: 1
%
% LogFileName
% If specified, then all output is saved in a ascii file.
%
% OutputPreamble
% If specified this should be a string to introduce in a fprintf command
% before showing the output of each worker. It must receive as input the
% worker number.
%
if ~exist('nMaxWorkers','var'), nMaxWorkers = 4; end
if ~exist('ShowOutput','var'), ShowOutput = 1; end
if ~exist('LogFileName','var')||isempty(LogFileName),Save2Log=0;else Save2Log=1;end
if isempty(JobOptions),for j=1:nJobs,JobOptions{j} = {};end,end
nCompleted = 0;
nSubmitted = 0;
j = 0;
jobsRunning = false(nJobs,1);
JobOut = cell(nJobs,1);
while nCompleted<nJobs
while (nSubmitted<nMaxWorkers) && (j<nJobs)
% submit new jobs
j = j+1;
nSubmitted = nSubmitted+1;
job{j} = createJob(0); %createSgeJob(0)
set(job{j},'PathDependencies',ListPathDependencies)
TaskID{j} = createTask(job{j},JobFcnHandle,nJobArgOut,JobOptions{j});
if ShowOutput
set(TaskID{j}, 'CaptureCommandWindowOutput', true);
end
submit(job{j})
jobsRunning(j) = true;
end
pause(0.01)
listJobsRunning = find(jobsRunning);
for jj=1:length(listJobsRunning)
if strcmp(get(job{listJobsRunning(jj)},'State'),'finished')
jobsRunning(listJobsRunning(jj)) = false;
nSubmitted = nSubmitted-1;
nCompleted = nCompleted+1;
end
end
end
for j=1:nJobs
if ShowOutput
if Save2Log
jobLogName = sprintf('%s%.0f.log',LogFileName,j);
fid = fopen(jobLogName,'wt');
if exist('OutputPreamble','var')
fprintf(fid,OutputPreamble,j);
end
fprintf(fid,strrep(get(TaskID{j},'CommandWindowOutput'),'%','%%'));
fclose(fid);
else
if exist('OutputPreamble','var')
fprintf(OutputPreamble,j);
end
fprintf(strrep(get(TaskID{j},'CommandWindowOutput'),'%','%%'));
end
end
if ~isempty(get(TaskID{j},'errormessage'))
fprintf(strrep(get(TaskID{j},'erroridentifier'),'%','%%'));fprintf('\n');
fprintf(strrep(get(TaskID{j},'errormessage'),'%','%%'));fprintf('\n');
end
JobOut{j} = getAllOutputArguments(job{j});
destroy(job{j})
end
JobOut = cat(1,JobOut{:});
end

Answers (1)

Raymond Norris
Raymond Norris on 13 Dec 2013
If it helps, there are SLURM integration scripts on File Exchange
http://www.mathworks.com/matlabcentral/fileexchange/29910-slurm-integration-scripts

Categories

Find more on Cluster Configuration in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!