[mpich-discuss] mpiexec fails to launch any processes

Mccall, Kurt E. (MSFC-EV41) kurt.e.mccall at nasa.gov
Tue Jun 14 02:21:53 CDT 2022


Hui,

Slurm doesn’t seem to be killing the job, as it still shows up when I run squeue.    A gdb stack trace shows where mpiexec is stuck – does this tell you anything?

#0  0x00007f9c895ddaa8 in poll () from /lib64/libc.so.6
#1  0x000000000045352c in HYDT_dmxu_poll_wait_for_event (wtime=-1)
    at ../../../../mpich-4.0.1/src/pm/hydra/tools/demux/demux_poll.c:39
#2  0x0000000000452e9a in HYDT_dmx_wait_for_event (wtime=-1)
    at ../../../../mpich-4.0.1/src/pm/hydra/tools/demux/demux.c:168
#3  0x000000000040cda4 in HYD_pmci_wait_for_completion (timeout=-1)
    at ../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:157
#4  0x0000000000404177 in main (argc=33, argv=0x7fff054e6888)
    at ../../../../mpich-4.0.1/src/pm/hydra/ui/mpich/mpiexec.c:324

Thanks,
Kurt

From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>
Sent: Monday, June 13, 2022 4:16 PM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: Re: [mpich-discuss] [EXTERNAL] Re: mpiexec fails to launch any processes

Hui,

That worked too.   I guess I’ll have to find a way to pass a “verbose” argument to sbatch and see why Slurm is killing my application.

Thanks,
Kurt

From: Zhou, Hui <zhouh at anl.gov<mailto:zhouh at anl.gov>>
Sent: Monday, June 13, 2022 4:11 PM
To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>; discuss at mpich.org<mailto:discuss at mpich.org>
Subject: Re: [EXTERNAL] Re: mpiexec fails to launch any processes

Kurt,

Could you try launch hostname​ with the same command?

    mpiexec -launcher ssh -verbose -print-all-exitcodes -wdir  <directory> -np 20 -ppn 1 hostname

If that went okay, it then seems to point to your application. Something in your code made Slurm kill the job.

--
Hui
________________________________
From: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Sent: Monday, June 13, 2022 4:02 PM
To: Zhou, Hui <zhouh at anl.gov<mailto:zhouh at anl.gov>>; discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Subject: RE: [EXTERNAL] Re: mpiexec fails to launch any processes


Hui,



$ mpiexec -N 10 -hostfile MySlurmNodeFile2 hostname



works properly, reporting from each of 10 nodes.



Kurt



From: Zhou, Hui <zhouh at anl.gov<mailto:zhouh at anl.gov>>
Sent: Monday, June 13, 2022 2:44 PM
To: discuss at mpich.org<mailto:discuss at mpich.org>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Subject: [EXTERNAL] Re: mpiexec fails to launch any processes



Hi Kurt,



I don't have much clue. Are you able to launch some trivial applications, for example, "hostname​"?



--

Hui

________________________________

From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org<mailto:discuss at mpich.org>>
Sent: Monday, June 13, 2022 12:29 PM
To: discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Subject: Re: [mpich-discuss] mpiexec fails to launch any processes



Outlook blocked the output file slurm.out that I had attached.   Trying to send it again as slurm.txt.



Kurt





Hi,



My mpiexec command fails to launch any processes.   I ran it with the -verbose option but didn’t see any obvious errors in the output (attached).



The command is:



mpiexec -launcher ssh -verbose -print-all-exitcodes -wdir  <directory> -np 20 -ppn 1  <more args…>



I am running MPICH 4.0.1 under Slurm 20.11.8.  Thanks for any help.



Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220614/21d42ee5/attachment.html>


More information about the discuss mailing list