[mpich-discuss] MPICH v3.2 and SLRUM
Balaji, Pavan
balaji at anl.gov
Tue Dec 1 10:41:50 CST 2015
Looks like SLURM is telling MPICH that two processes are on the same node, even though they are on different nodes. It looks like a bug in the SLURM PMI implementation. Did you try simply using mpiexec instead? You'll need to remove the --with-pmi, --with-pm, and LDFLAGS/LIBS options and rebuild mpich for that. Note that mpiexec will internally use srun on slurm environments.
-- Pavan
> On Dec 1, 2015, at 5:56 AM, Andreas Gocht <andreas.gocht at tu-dresden.de> wrote:
>
> Hey
>
> I tried to build an use mpich with slurm, sbatch and srun. Unfortunate it looks like MPI_Init doesn't work with srun.
>
> I got the following error:
>
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(474).................:
> MPID_Init(190)........................: channel initialization failed
> MPIDI_CH3_Init(89)....................:
> MPID_nem_init(272)....................:
> MPIDI_CH3I_Seg_commit(366)............:
> MPIU_SHMW_Hnd_deserialize(324)........:
> MPIU_SHMW_Seg_open(865)...............:
> MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or directory
> In: PMI_Abort(4239887, Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(474).................:
> MPID_Init(190)........................: channel initialization failed
> MPIDI_CH3_Init(89)....................:
> MPID_nem_init(272)....................:
> MPIDI_CH3I_Seg_commit(366)............:
> MPIU_SHMW_Hnd_deserialize(324)........:
> MPIU_SHMW_Seg_open(865)...............:
> MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or directory)
>
> I configured MPICH with "./configure --prefix=<some/prefix> --with-pmi=slurm --with-pm=none --with-slurm=<path/to/slurm>" and compiled my application with the "-L<path_to_slurm_lib> -lpmi" command.
>
> (as described in
>
> https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Note_that_the_default_build_of_MPICH_will_work_fine_in_SLURM_environments._No_extra_steps_are_needed.
>
> and
>
> https://computing.llnl.gov/linux/slurm/mpi_guide.html#mpich2
>
> )
>
> I am running with 10 nodes and one task per node. Is there something I am missing during the configuration of MPICH?
>
> Best,
>
> Andreas
>
> --
> M.Sc. Andreas Gocht
>
> Technische Universität Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
>
> Contact:
> Willersbau, Room A 104
> Phone: (+49) 351 463-36415
> Fax: (+49) 351 463-3773
> e-mail: andreas.gocht at tu-dresden.de
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list