[mpich-discuss] mpich3 3.0.4 and binding on four socket nodes
Susan A. Schwarz
Susan.A.Schwarz at dartmouth.edu
Wed May 7 14:25:24 CDT 2014
I am using mpich3 3.0.4 built with Intel 13.0 compilers on Centos 6.5. I
am trying to use the 'bind-to core' and 'map-by core:2' options to
mpiexec. I am able to run this successfully on nodes in my cluster where
there are 2 sockets. However, when I try to run it on a 4 socket node
with 12 cores per socket, I get the following errors:
[proxy:0:0 at f02] get_nbobjs_by_type
(./tools/topo/hwloc/topo_hwloc.c:189): assert (nb % x == 0) failed
[proxy:0:0 at f02] handle_bitmap_binding
(./tools/topo/hwloc/topo_hwloc.c:450): unable to get number of objects
[proxy:0:0 at f02] HYDT_topo_hwloc_init
(./tools/topo/hwloc/topo_hwloc.c:527): error binding with bind "core"
and map "core:2"
[proxy:0:0 at f02] HYDT_topo_init (./tools/topo/topo.c:60): unable to
initialize hwloc
[proxy:0:0 at f02] launch_procs (./pm/pmiserv/pmip_cb.c:520): unable to
initialize process topology
[proxy:0:0 at f02] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:893): launch_procs returned error
[proxy:0:0 at f02] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at f02] main (./pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[mpiexec at f02] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert
(!closed) failed
[mpiexec at f02] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at f02] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec at f02] main (./ui/mpich/mpiexec.c:331): process manager error
waiting for completion
Any idea what could be causing this problem?
Susan Schwarz
Research Computing
Dartmouth
More information about the discuss
mailing list