Help with parcluster, createJob, and createTask

6 views (last 30 days)
I've recently changed our code to no longer use matlabpool but parcluster, createJob and createTask. Things are working well on matlab for our node with 16 cores. However, we experience the code crashing on our newer nodes with 32 cores. The code will run for 30 minutes and when its time for the code to distribute onto 32 cores, it crashes. To verify the cores I get the following from our cluster:
MATLAB detected: 32 physical cores. MATLAB detected: 32 logical cores. MATLAB was assigned: 32 logical cores by the OS. MATLAB is using: 32 logical cores.
The error is with parallel submit:
I get the following message from output_8_2_14.txt:
{Error using parallel.Job/submit (line 304) Java exception occurred: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at java.util.concurrent.AbstractExecutorService.submit(Unknown Source) at com.mathworks.toolbox.distcomp.local.LocalScheduler.submitInOrder(LocalScheduler.java:143) at com.mathworks.toolbox.distcomp.local.LocalScheduler.submit(LocalScheduler.java:138) at com.mathworks.toolbox.distcomp.local.AbstractLocalCommand.submit(AbstractLocalCommand.java:172)
Error in optimizeCall_v208 (line 192) submit(j);
Error in optimization_BF_v3 (line 414) [x1, fval, h, fid]=optimizeCall_v208(bField_ROI,bField_NEG,h,fid); %#ok<NOPRT> } >> MATLAB: runtime/shutdown.cpp:168: bool mnShutdownMatlabInternal(bool, bool, const boost::optional<int>&, int*, bool, bool): Assertion `Unexpected exception during MATLAB shutdown: boost::thread_resource_error' failed.
------------------------------------------------------------------------ Assertion detected at Sat Aug 2 09:55:00 2014 ------------------------------------------------------------------------
Configuration: Crash Decoding : Disabled Current Visual : None Default Encoding : UTF-8 GNU C Library : 2.14 stable MATLAB Architecture: glnxa64 MATLAB Root : /usr/local/MATLAB/R2014a MATLAB Version : 8.3.0.532 (R2014a) Operating System : Linux 2.6.40.3-0.fc15.x86_64 #1 SMP Tue Aug 16 04:10:59 UTC 2011 x86_64 Processor ID : x86 Family 31 Model 9 Stepping 1, AuthenticAMD Virtual Machine : Java 1.7.0_11-b21 with Oracle Corporation Java HotSpot™ 64-Bit Server VM mixed mode Window System : No active display
Fault Count: 1
Assertion in bool mnShutdownMatlabInternal(bool, bool, const boost::optional<int>&, int*, bool, bool) at runtime/shutdown.cpp line 168: Unexpected exception during MATLAB shutdown: boost::thread_resource_error
Register State (captured): RAX = 00007f3a3f87d900 RBX = 00007f3a3f87df00 RCX = 0000000000000012 RDX = 00007f3a57586df8 RSP = 00007f3a3f87d710 RBP = 00007f3a3f87dad0 RSI = 0000000000000001 RDI = 00007f3a3f87d720
R8 = 0000000000000000 R9 = 000000000000a0c4 R10 = 0000014200000001 R11 = 00007f39d700f360 R12 = 00007f3a575a7d20 R13 = 00007f3a563bc24e R14 = 00007f3a563bc320 R15 = 00007f3a3f87e740
RIP = 00007f3a5729a4ee EFL = 0000000000000003
matlab_crash_dump.41110-1.txt states:
------------------------------------------------------------------------ Assertion detected at Sat Aug 2 09:55:00 2014 ------------------------------------------------------------------------
Configuration: Crash Decoding : Disabled Current Visual : None Default Encoding : UTF-8 GNU C Library : 2.14 stable MATLAB Architecture: glnxa64 MATLAB Root : /usr/local/MATLAB/R2014a MATLAB Version : 8.3.0.532 (R2014a) Operating System : Linux 2.6.40.3-0.fc15.x86_64 #1 SMP Tue Aug 16 04:10:59 UTC 2011 x86_64 Processor ID : x86 Family 31 Model 9 Stepping 1, AuthenticAMD Virtual Machine : Java 1.7.0_11-b21 with Oracle Corporation Java HotSpot™ 64-Bit Server VM mixed mode Window System : No active display
Fault Count: 1
Assertion in bool mnShutdownMatlabInternal(bool, bool, const boost::optional<int>&, int*, bool, bool) at runtime/shutdown.cpp line 168: Unexpected exception during MATLAB shutdown: boost::thread_resource_error
Register State (captured): RAX = 00007f3a3f87d900 RBX = 00007f3a3f87df00 RCX = 0000000000000012 RDX = 00007f3a57586df8 RSP = 00007f3a3f87d710 RBP = 00007f3a3f87dad0 RSI = 0000000000000001 RDI = 00007f3a3f87d720
R8 = 0000000000000000 R9 = 000000000000a0c4 R10 = 0000014200000001 R11 = 00007f39d700f360 R12 = 00007f3a575a7d20 R13 = 00007f3a563bc24e R14 = 00007f3a563bc320 R15 = 00007f3a3f87e740
RIP = 00007f3a5729a4ee EFL = 0000000000000003
CS = 0003 FS = 0000 GS = 0000
Stack Trace (captured): [ 0] 0x00007f3a5729a4ee /usr/local/MATLAB/R2014a/bin/glnxa64/libmwfl.so+00972014 _ZN2fl4diag5linux12context_base12capture_dataEv+00000030
If this problem is reproducible, please submit a Service Request via: http://www.mathworks.com/support/contact_us/
A technical support engineer might contact you with further information.
Thank you for your help.

Accepted Answer

Thomas Ibbotson
Thomas Ibbotson on 4 Aug 2014
It looks like the number of processes you can run in your user account is limited. You can find out what it is set to by running the following command in a terminal:
ulimit -a
which will give you output like:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256447
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 128000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 500000
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
You may need to edit /etc/security/limits.conf to increase 'max user processes' this if it is too low.
  3 Comments
Sossena Wood
Sossena Wood on 4 Aug 2014
I'm not experienced with this...what would I adjust in this file:
  1. /etc/security/limits.conf##This file sets the resource limits for the users logged in via PAM.#It does not affect resource limits of the system services.##Each line describes a limit for a user in the form:##<domain> type item value##Where:#<domain> can be:
  2. - an user name
  3. - a group name, with @group syntax
  4. - the wildcard *, for default entry
  5. - the wildcard %, can be also used with %group syntax,
  6. for maxlogin limit##<type> can have the two values:
  7. - "soft" for enforcing the soft limits
  8. - "hard" for enforcing hard limits##<item> can be one of the following:
  9. - core - limits the core file size (KB)
  10. - data - max data size (KB)
  11. - fsize - maximum filesize (KB)
  12. - memlock - max locked-in-memory address space (KB)
  13. - nofile - max number of open files
  14. - rss - max resident set size (KB)
  15. - stack - max stack size (KB)
  16. - cpu - max CPU time (MIN)
  17. - nproc - max number of processes
  18. - as - address space limit (KB)
  19. - maxlogins - max number of logins for this user
  20. - maxsyslogins - max number of logins on the system
  21. - priority - the priority to run user process with
  22. - locks - max number of file locks the user can hold
  23. - sigpending - max number of pending signals
  24. - msgqueue - max memory used by POSIX message queues (bytes)
  25. - nice - max nice priority allowed to raise to values: [-20, 19]
  26. - rtprio - max realtime priority##<domain> type item value#
#* soft core 0
#* hard rss 10000
#@student hard nproc 20
#@faculty soft nproc 20
#@faculty hard nproc 50
#ftp hard nproc 0
#@student - maxlogins 4
  1. End of file
Thomas Ibbotson
Thomas Ibbotson on 7 Aug 2014
Add a line like this:
username hard nproc 100000
but replace 'username' with your username. If you want it to be set for a group then use '@groupname', or for all users use '*'

Sign in to comment.

More Answers (0)

Categories

Find more on Manage Products in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!