[mpich-discuss] Error Running MPICH for Photochemical Modeling
Abhishek Bhat
abhat at trinityconsultants.com
Wed Sep 17 15:29:20 CDT 2014
Sangmin,
Fatal error in MPI_Recv: A process has failed, error stack:
MPI_Recv(187).............: MPI_Recv(buf=0x7fff21bc04b0, count=644490, MPI_REAL, src=1, tag=14131, MPI_COMM_WORLD, status=0x7fff227c47a0) failed
dequeue_and_set_error(865): Communication error with rank 1
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 4183 RUNNING AT dfw-camx
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:4 at dfw-camx-n4] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:4 at dfw-camx-n4] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:4 at dfw-camx-n4] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:2 at dfw-camx-n2] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:2 at dfw-camx-n2] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:2 at dfw-camx-n2] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:3 at dfw-camx-n3] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:3 at dfw-camx-n3] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:3 at dfw-camx-n3] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:6 at dfw-camx-n6] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:6 at dfw-camx-n6] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:6 at dfw-camx-n6] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:5 at dfw-camx-n5] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:5 at dfw-camx-n5] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:5 at dfw-camx-n5] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:7 at dfw-camx-n7] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:7 at dfw-camx-n7] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:7 at dfw-camx-n7] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec at dfw-camx] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec at dfw-camx] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at dfw-camx] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec at dfw-camx] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
This is what I got from the exitallcodes.
Anything helpful??
Thank you very much for all help
Abhishek
................................................................................................................
Abhishek Bhat, PhD, EPI,
Senior Consultant
From: Seo, Sangmin [mailto:sseo at anl.gov]
Sent: Wednesday, September 17, 2014 1:17 PM
To: Abhishek Bhat
Subject: Re: [mpich-discuss] Error Running MPICH for Photochemical Modeling
On Sep 17, 2014, at 1:08 PM, Abhishek Bhat <abhat at trinityconsultants.com<mailto:abhat at trinityconsultants.com>> wrote:
Sangmin,
What should be the correct syntax for print all exitcodes -
If I use
if( ! { mpiexec -machinefile nodes -np $NUMPROCS -print-all-exitcodes $EXEC -mpich-dbg=file -mpich-dbg-class=all -mpich-dbg-level=verbose } )
This is correct. And, the output will be shown on your terminal, not in file, like:
[mpiexec at host] Exit codes: [host] 0,0
I am getting error saying "-print-all-exitcodes" is not a valid local parameters
Which version of MPICH are you using? Can you let me know the result of the following?
$ mpiexec -info
- Sangmin
--
_________________________________________________________________________
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.
_________________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140917/3f17dd04/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list