[mpich-discuss] mpiexec error

Raffenetti, Ken raffenet at anl.gov
Fri May 6 12:13:00 CDT 2022


Hi Kurt,

Before running mpiexec, can you print out the hostfile to confirm the contents? Something like this:

  cat $PBS_HOSTFILE

Ken

On 5/6/22, 11:55 AM, "Mccall, Kurt E. (MSFC-EV41) via discuss" <discuss at mpich.org> wrote:

    Running MPICH 4.0.1 under Torque 5.1, I’m getting the mpiexec error “user specified host not in the PBS allocated list”.   My qsub command is:

    qsub -V -j oe -e stdio -o stdio -f -X -l nodes=21:ppn=20  <bash_script>


    My mpiexec command is:

    mpiexec -print-all-exitcodes -enable-x -np 21  -wdir ${work_dir} -env DISPLAY localhost:10.0 --ppn 1  <more args> …


    Here is the full error message.   Thanks for any help.

    [mpiexec at n022.cluster.com] find_pbs_node_id (../../../../mpich-4.0.1/src/pm/hydra/tools/bootstrap/external/pbs_launch.c:27): user specified host not in the PBS allocated list
    [mpiexec at n022.cluster.com] HYDT_bscd_pbs_launch_procs (../../../../mpich-4.0.1/src/pm/hydra/tools/bootstrap/external/pbs_launch.c:74): error finding PBS node ID for host n022
    [mpiexec at n022.cluster.com] HYDT_bsci_launch_procs (../../../../mpich-4.0.1/src/pm/hydra/tools/bootstrap/src/bsci_launch.c:17): launcher returned error while launching processes
    [mpiexec at n022.cluster.com] fn_spawn (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v1.c:580): launcher cannot launch processes
    [mpiexec at n022.cluster.com] handle_pmi_cmd (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:48): PMI handler returned error
    [mpiexec at n022.cluster.com] control_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:284): unable to process PMI command
    [mpiexec at n022.cluster.com] HYDT_dmxu_poll_wait_for_event (../../../../mpich-4.0.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status
    [mpiexec at n022.cluster.com] HYD_pmci_wait_for_completion (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:160): error waiting for event
    [mpiexec at n022.cluster.com] main (../../../../mpich-4.0.1/src/pm/hydra/ui/mpich/mpiexec.c:325): process manager error waiting for completion
    [proxy:0:0 at n022.cluster.com] HYD_pmcd_pmip_control_cmd_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:899): assert (!closed) failed
    [proxy:0:0 at n022.cluster.com] HYDT_dmxu_poll_wait_for_event (../../../../mpich-4.0.1/src/pm/hydra/tools/demux/[proxy:0:2 at n020.cluster.com] HYD_pmcd_pmip_control_cmd_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:899): assert (!closed) failed
    [proxy:0:2 at n020.cluster.com] HYDT_dmxu_poll_wait_for_event (../../../../mpich-4.0.1/src/pm/hydra/tools/demux/[proxy:0:5 at n016.cluster.com] HYD_pmcd_pmip_control_cmd_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:899): assert (!closed) failed
    [proxy:0:5 at n016.cluster.com] HYDT_dmxu_poll_wait_for_event (../../../../mpich-4.0.1/src/pm/hydra/tools/demux/[proxy:0:15 at n006.cluster.com] HYD_pmcd_pmip_control_cmd_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:899): assert (!closed) failed
    [proxy:0:15 at n006.cluster.com] HYDT_dmxu_poll_wait_for_event (../../../../mpich-4.0.1/src/pm/hydra/tools/demu[proxy:0:16 at n005.cluster.com] HYD_pmcd_pmip_control_cmd_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:899): assert (!closed) failed
    [proxy:0:16 at n005.cluster.com] HYDT_dmxu_poll_wait_for_event (../../../../mpich-4.0.1/src/pm/hydra/tools/demu[proxy:0:19 at n002.cluster.com] HYD_pmcd_pmip_control_cmd_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:899): assert (!closed) failed
    [proxy:0:19 at n002.cluster.com] HYDT_dmxu_poll_wait_for_event (../../../../mpich-4.0.1/src/pm/hydra/tools/demu[proxy:0:20 at n001.cluster.com] HYD_pmcd_pmip_control_cmd_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:899): assert (!closed) failed
    [proxy:0:20 at n001.cluster.com] HYDT_dmxu_poll_wait_for_event (../../../../mpich-4.0.1/src/pm/hydra/tools/demudemux_poll.c:76): callback returned error status
    [proxy:0:0 at n022.cluster.com] main (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip.c:169): demux engine error waiting for event
    demux_poll.c:76): callback returned error status
    [proxy:0:2 at n020.cluster.com] main (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip.c:169): demux engine error waiting for event
    [proxy:0:1 at n021.cluster.com] HYD_pmcd_pmip_control_cmd_cb (../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:899): assert (!closed) failed




More information about the discuss mailing list