[mpich-discuss] mpich3 3.0.4 and binding on four socket nodes

Susan A. Schwarz Susan.A.Schwarz at dartmouth.edu
Wed May 7 14:25:24 CDT 2014


I am using mpich3 3.0.4 built with Intel 13.0 compilers on Centos 6.5. I 
am trying to use the 'bind-to  core' and 'map-by core:2' options to 
mpiexec. I am able to run this successfully on nodes in my cluster where 
there are 2 sockets. However, when I try to run it on a 4 socket node 
with 12 cores per socket, I get the following errors:

[proxy:0:0 at f02] get_nbobjs_by_type 
(./tools/topo/hwloc/topo_hwloc.c:189): assert (nb % x == 0) failed
[proxy:0:0 at f02] handle_bitmap_binding 
(./tools/topo/hwloc/topo_hwloc.c:450): unable to get number of objects
[proxy:0:0 at f02] HYDT_topo_hwloc_init 
(./tools/topo/hwloc/topo_hwloc.c:527): error binding with bind "core" 
and map "core:2"
[proxy:0:0 at f02] HYDT_topo_init (./tools/topo/topo.c:60): unable to 
initialize hwloc
[proxy:0:0 at f02] launch_procs (./pm/pmiserv/pmip_cb.c:520): unable to 
initialize process topology
[proxy:0:0 at f02] HYD_pmcd_pmip_control_cmd_cb 
(./pm/pmiserv/pmip_cb.c:893): launch_procs returned error
[proxy:0:0 at f02] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at f02] main (./pm/pmiserv/pmip.c:206): demux engine error 
waiting for event
[mpiexec at f02] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert 
(!closed) failed
[mpiexec at f02] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at f02] HYD_pmci_wait_for_completion 
(./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec at f02] main (./ui/mpich/mpiexec.c:331): process manager error 
waiting for completion

Any idea what could be causing this problem?

Susan Schwarz
Research Computing
Dartmouth






More information about the discuss mailing list