[mpich-discuss] MPICH 3.2 failing in MPI_Init

Kenneth Raffenetti raffenet at mcs.anl.gov
Thu Apr 7 09:55:52 CDT 2016


Just to be sure, can you use the mpiexec that is built/installed with 
MPICH 3.2? You mention mpiexec version 0.84 below, so that's the first 
thing I would try.

Ken

On 04/07/2016 05:49 AM, Andrew Wood wrote:
> Hi,
>
> I'm trying to get MPICH 3.2 working on our cluster, but jobs are failing in
> MPI_Init with the following output if they are run on two or more nodes (4
> processes per node):
>
>
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(490).................:
> MPID_Init(201)........................: channel initialization failed
> MPIDI_CH3_Init(93)....................:
> MPID_nem_init(285)....................:
> MPIDI_CH3I_Seg_commit(366)............:
> MPIU_SHMW_Hnd_deserialize(324)........:
> MPIU_SHMW_Seg_open(867)...............:
> MPIU_SHMW_Seg_create_attach_templ(638): open failed - No such file or directory
> mpiexec: Error: handle_pmi: unknown cmd abort.
>
>
>
> The full output above only occurs intermittently. Sometimes only the last line
> appears (job aborted before stderr is flushed?).
>
>
>
> Our cluster uses Torque 2.5.13 and Maui 3.3.1, and the jobs are launched with
> mpiexec 0.84.
>
>
>
> I've configured MPICH as follows.
>
> ./configure --enable-error-checking=all --enable-error-messages=all
> --enable-g=all --disable-fast --enable-check-compiler-flags --enable-fortran=all
> --enable-cxx --enable-romio --enable-debuginfo --enable-versioning --enable-strict
>
>
>
> I've found the problem goes away if I include the option
> '--enable-nemesis-dbg-nolocal', but presumably that could have an impact on
> performance.
>
>
> The problem doesn't occur with MPICH 3.1.4, configured with the same options.
>
>
> I've found this message in the mailing list archives, reporting the same problem
> http://lists.mpich.org/pipermail/discuss/2015-December/004352.html
> However, that was on a system using SLURM, and the replies suggest that the
> problem was with SLURM rather than MPICH, and we're not using SLURM on our system.
>
> Can anyone help?
>
>
> Regards,
> Andy.
>
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list