[mpich-discuss] mpiexec fails to launch any processes
Mccall, Kurt E. (MSFC-EV41)
kurt.e.mccall at nasa.gov
Tue Jun 14 02:21:53 CDT 2022
Hui,
Slurm doesn’t seem to be killing the job, as it still shows up when I run squeue. A gdb stack trace shows where mpiexec is stuck – does this tell you anything?
#0 0x00007f9c895ddaa8 in poll () from /lib64/libc.so.6
#1 0x000000000045352c in HYDT_dmxu_poll_wait_for_event (wtime=-1)
at ../../../../mpich-4.0.1/src/pm/hydra/tools/demux/demux_poll.c:39
#2 0x0000000000452e9a in HYDT_dmx_wait_for_event (wtime=-1)
at ../../../../mpich-4.0.1/src/pm/hydra/tools/demux/demux.c:168
#3 0x000000000040cda4 in HYD_pmci_wait_for_completion (timeout=-1)
at ../../../../mpich-4.0.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:157
#4 0x0000000000404177 in main (argc=33, argv=0x7fff054e6888)
at ../../../../mpich-4.0.1/src/pm/hydra/ui/mpich/mpiexec.c:324
Thanks,
Kurt
From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>
Sent: Monday, June 13, 2022 4:16 PM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: Re: [mpich-discuss] [EXTERNAL] Re: mpiexec fails to launch any processes
Hui,
That worked too. I guess I’ll have to find a way to pass a “verbose” argument to sbatch and see why Slurm is killing my application.
Thanks,
Kurt
From: Zhou, Hui <zhouh at anl.gov<mailto:zhouh at anl.gov>>
Sent: Monday, June 13, 2022 4:11 PM
To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>; discuss at mpich.org<mailto:discuss at mpich.org>
Subject: Re: [EXTERNAL] Re: mpiexec fails to launch any processes
Kurt,
Could you try launch hostname with the same command?
mpiexec -launcher ssh -verbose -print-all-exitcodes -wdir <directory> -np 20 -ppn 1 hostname
If that went okay, it then seems to point to your application. Something in your code made Slurm kill the job.
--
Hui
________________________________
From: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Sent: Monday, June 13, 2022 4:02 PM
To: Zhou, Hui <zhouh at anl.gov<mailto:zhouh at anl.gov>>; discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Subject: RE: [EXTERNAL] Re: mpiexec fails to launch any processes
Hui,
$ mpiexec -N 10 -hostfile MySlurmNodeFile2 hostname
works properly, reporting from each of 10 nodes.
Kurt
From: Zhou, Hui <zhouh at anl.gov<mailto:zhouh at anl.gov>>
Sent: Monday, June 13, 2022 2:44 PM
To: discuss at mpich.org<mailto:discuss at mpich.org>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Subject: [EXTERNAL] Re: mpiexec fails to launch any processes
Hi Kurt,
I don't have much clue. Are you able to launch some trivial applications, for example, "hostname"?
--
Hui
________________________________
From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org<mailto:discuss at mpich.org>>
Sent: Monday, June 13, 2022 12:29 PM
To: discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Subject: Re: [mpich-discuss] mpiexec fails to launch any processes
Outlook blocked the output file slurm.out that I had attached. Trying to send it again as slurm.txt.
Kurt
Hi,
My mpiexec command fails to launch any processes. I ran it with the -verbose option but didn’t see any obvious errors in the output (attached).
The command is:
mpiexec -launcher ssh -verbose -print-all-exitcodes -wdir <directory> -np 20 -ppn 1 <more args…>
I am running MPICH 4.0.1 under Slurm 20.11.8. Thanks for any help.
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220614/21d42ee5/attachment.html>
More information about the discuss
mailing list