[mpich-discuss] Error Running MPICH for Photochemical Modeling

Abhishek Bhat abhat at trinityconsultants.com
Wed Sep 17 15:29:20 CDT 2014


Sangmin,

Fatal error in MPI_Recv: A process has failed, error stack:
MPI_Recv(187).............: MPI_Recv(buf=0x7fff21bc04b0, count=644490, MPI_REAL, src=1, tag=14131, MPI_COMM_WORLD, status=0x7fff227c47a0) failed
dequeue_and_set_error(865): Communication error with rank 1

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 4183 RUNNING AT dfw-camx
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:4 at dfw-camx-n4] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:4 at dfw-camx-n4] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:4 at dfw-camx-n4] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:2 at dfw-camx-n2] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:2 at dfw-camx-n2] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:2 at dfw-camx-n2] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:3 at dfw-camx-n3] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:3 at dfw-camx-n3] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:3 at dfw-camx-n3] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:6 at dfw-camx-n6] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:6 at dfw-camx-n6] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:6 at dfw-camx-n6] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:5 at dfw-camx-n5] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:5 at dfw-camx-n5] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:5 at dfw-camx-n5] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:7 at dfw-camx-n7] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:7 at dfw-camx-n7] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:7 at dfw-camx-n7] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec at dfw-camx] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec at dfw-camx] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at dfw-camx] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec at dfw-camx] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion

This is what I got from the exitallcodes.


Anything helpful??

Thank you very much for all help
Abhishek

................................................................................................................
Abhishek Bhat, PhD, EPI,
Senior Consultant


From: Seo, Sangmin [mailto:sseo at anl.gov]
Sent: Wednesday, September 17, 2014 1:17 PM
To: Abhishek Bhat
Subject: Re: [mpich-discuss] Error Running MPICH for Photochemical Modeling


On Sep 17, 2014, at 1:08 PM, Abhishek Bhat <abhat at trinityconsultants.com<mailto:abhat at trinityconsultants.com>> wrote:


Sangmin,

What should be the correct syntax for print all exitcodes -
If I use

if( ! { mpiexec -machinefile nodes -np $NUMPROCS -print-all-exitcodes $EXEC -mpich-dbg=file -mpich-dbg-class=all -mpich-dbg-level=verbose } )

This is correct. And, the output will be shown on your terminal, not in file, like:
[mpiexec at host] Exit codes: [host] 0,0



I am getting error saying "-print-all-exitcodes" is not a valid local parameters

Which version of MPICH are you using? Can you let me know the result of the following?
$ mpiexec -info

- Sangmin


-- 
_________________________________________________________________________

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.
_________________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140917/3f17dd04/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list