[mpich-discuss] Error Running MPICH for Photochemical Modeling
Lu, Huiwei
huiweilu at mcs.anl.gov
Fri Sep 19 17:30:19 CDT 2014
Hi, Abhishek,
As mentioned in previous email, it looks like the problem lies in the application. Is it possible that you can provide us with a minimum example that fails? So that we can looked at the code and reproduce the problem on our machines.
Thanks,
—
Huiwei
On Sep 17, 2014, at 3:29 PM, Abhishek Bhat <abhat at trinityconsultants.com> wrote:
> Sangmin,
>
> Fatal error in MPI_Recv: A process has failed, error stack:
> MPI_Recv(187).............: MPI_Recv(buf=0x7fff21bc04b0, count=644490, MPI_REAL, src=1, tag=14131, MPI_COMM_WORLD, status=0x7fff227c47a0) failed
> dequeue_and_set_error(865): Communication error with rank 1
>
> ===================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 4183 RUNNING AT dfw-camx
> = EXIT CODE: 1
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> [proxy:0:4 at dfw-camx-n4] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
> [proxy:0:4 at dfw-camx-n4] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:4 at dfw-camx-n4] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [proxy:0:2 at dfw-camx-n2] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
> [proxy:0:2 at dfw-camx-n2] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:2 at dfw-camx-n2] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [proxy:0:3 at dfw-camx-n3] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
> [proxy:0:3 at dfw-camx-n3] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:3 at dfw-camx-n3] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [proxy:0:6 at dfw-camx-n6] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
> [proxy:0:6 at dfw-camx-n6] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:6 at dfw-camx-n6] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [proxy:0:5 at dfw-camx-n5] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
> [proxy:0:5 at dfw-camx-n5] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:5 at dfw-camx-n5] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [proxy:0:7 at dfw-camx-n7] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
> [proxy:0:7 at dfw-camx-n7] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:7 at dfw-camx-n7] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [mpiexec at dfw-camx] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
> [mpiexec at dfw-camx] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec at dfw-camx] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
> [mpiexec at dfw-camx] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
>
> This is what I got from the exitallcodes.
>
>
> Anything helpful??
>
> Thank you very much for all help
> Abhishek
>
> ………………………………………………………………………………………………….
> Abhishek Bhat, PhD, EPI,
> Senior Consultant
>
>
> From: Seo, Sangmin [mailto:sseo at anl.gov]
> Sent: Wednesday, September 17, 2014 1:17 PM
> To: Abhishek Bhat
> Subject: Re: [mpich-discuss] Error Running MPICH for Photochemical Modeling
>
>
> On Sep 17, 2014, at 1:08 PM, Abhishek Bhat <abhat at trinityconsultants.com> wrote:
>
>
> Sangmin,
>
> What should be the correct syntax for print all exitcodes –
> If I use
>
> if( ! { mpiexec -machinefile nodes -np $NUMPROCS -print-all-exitcodes $EXEC -mpich-dbg=file -mpich-dbg-class=all -mpich-dbg-level=verbose } )
>
> This is correct. And, the output will be shown on your terminal, not in file, like:
> [mpiexec at host] Exit codes: [host] 0,0
>
>
>
> I am getting error saying “-print-all-exitcodes” is not a valid local parameters
>
> Which version of MPICH are you using? Can you let me know the result of the following?
> $ mpiexec -info
>
> — Sangmin
>
>
> _________________________________________________________________________
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
> _________________________________________________________________________
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list