[mpich-discuss] mpich3 3.0.4 and binding on four socket nodes

Kenneth Raffenetti raffenet at mcs.anl.gov
Wed May 7 14:31:41 CDT 2014


There was significant improvement to the process binding features in the 
MPICH 3.1 release. Can you build that version and see if it fixes the issue?

Ken

On 05/07/2014 02:25 PM, Susan A. Schwarz wrote:
> I am using mpich3 3.0.4 built with Intel 13.0 compilers on Centos 6.5. I
> am trying to use the 'bind-to  core' and 'map-by core:2' options to
> mpiexec. I am able to run this successfully on nodes in my cluster where
> there are 2 sockets. However, when I try to run it on a 4 socket node
> with 12 cores per socket, I get the following errors:
>
> [proxy:0:0 at f02] get_nbobjs_by_type
> (./tools/topo/hwloc/topo_hwloc.c:189): assert (nb % x == 0) failed
> [proxy:0:0 at f02] handle_bitmap_binding
> (./tools/topo/hwloc/topo_hwloc.c:450): unable to get number of objects
> [proxy:0:0 at f02] HYDT_topo_hwloc_init
> (./tools/topo/hwloc/topo_hwloc.c:527): error binding with bind "core"
> and map "core:2"
> [proxy:0:0 at f02] HYDT_topo_init (./tools/topo/topo.c:60): unable to
> initialize hwloc
> [proxy:0:0 at f02] launch_procs (./pm/pmiserv/pmip_cb.c:520): unable to
> initialize process topology
> [proxy:0:0 at f02] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:893): launch_procs returned error
> [proxy:0:0 at f02] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at f02] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
> [mpiexec at f02] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert
> (!closed) failed
> [mpiexec at f02] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at f02] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
> [mpiexec at f02] main (./ui/mpich/mpiexec.c:331): process manager error
> waiting for completion
>
> Any idea what could be causing this problem?
>
> Susan Schwarz
> Research Computing
> Dartmouth
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list