Why do I get an error initializing MPI when validating my cluster?

8 views (last 30 days)
When I try to validate my MDCS cluster on Linux, I get an error validating Parallel Pool that reads:
 
Stage: Parallel pool test (parpool)
Status: Failed
Description:The validation stage encountered a MATLAB exception.
Command Line Output:(none)
Error Report:
Failed to initialize the interactive session.
 
Caused by:
    Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 759)
    The interactive communicating job errored with the following message: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.).
    Original cancel message:
    The task was cancelled by user "matlab" on machine "c086.cm.cluster" with message: "MPI initialisation failed:
    Not enough worker Java memory to allocate data.  Refer to the troubleshooting section of the documentation for information on how to increase the size of the local Java memory and the memory on the head node and the workers.".
Debug Log:(none)

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 16 Jul 2014
This issue is likely caused by a limit on the number of processes available to your user account. You should:
1) Make sure the MDCE + MJS + MDCS workers are launched with "ulimit -u unlimited" set.
2) Also make sure matlab is launched with "ulimit -u unlimited" set.
 
If this works, please speak with your system administrator to increase your user process limit in /etc/security/limits.conf and/or system's default login shells.

More Answers (0)

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Tags

No tags entered yet.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!