[mpich-discuss] error with MPI_Reduce running cpi
Zhou, Hui
zhouh at anl.gov
Mon Jun 17 12:59:04 CDT 2019
The error message says process 0 tried to connect to process 1 and the connection is refused. Common reason is firewall rules. The processes listen on a random port such as 23456, so even when ssh works (port 22), the other ports may be still blocked.
—
Hui Zhou
On Jun 17, 2019, at 11:28 AM, Jinang_Shah <sjinang at iitk.ac.in<mailto:sjinang at iitk.ac.in>> wrote:
1) $ ./configure --prefix=/users/misc/sjinang/mpich-install
2) No. It is showing an error : mpiexec -n 2 -f hostfile ./mpi/mpich-3.3.1/examples/send_recv
Fatal error in PMPI_Send: Unknown error class, error stack:
PMPI_Send(159).............: MPI_Send(buf=0x7ffe5d59e2a4, count=1, MPI_INT, dest=1, tag=0, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1845): Communication error with rank 1: Connection refused
3) I have done ssh from both the sides and its working fine. So now can you see the problem...
Thanks.
On 17-06-2019 21:28, Zhou, Hui wrote:
Hi Jinang_Shah,
Could you list your configure line ( try `head config.log`)?
If you try a simple example where process 0 sends a short message to process 1, would you result in similar error?
Can csews1 and csews2 connect to each other freely, i.e. is there firewall, router, etc. between these two hosts?
—
Hui Zhou
On Jun 17, 2019, at 10:11 AM, Jinang_Shah via discuss <discuss at mpich.org<mailto:discuss at mpich.org>> wrote:
$ mpiexec -n 2 -f hostfile ./mpi/mpich-3.3.1/examples/cpi
Process 1 of 2 is on csews2
Process 0 of 2 is on csews1
Fatal error in PMPI_Reduce: Unknown error class, error stack:
PMPI_Reduce(523)................: MPI_Reduce(sbuf=0x7ffeab580c10, rbuf=0x7ffeab580c18, count=1, datatype=MPI_DOUBLE, op=MPI_SUM, root=0, comm=MPI_COMM_WORLD) failed
PMPI_Reduce(509)................:
MPIR_Reduce_impl(316)...........:
MPIR_Reduce_intra_auto(231).....:
MPIR_Reduce_intra_binomial(125).:
MPIDI_CH3U_Recvq_FDU_or_AEP(629): Communication error with rank 1
Can anyone explain this error and how to overcome it.I have installed just MPICH and not HYDRA package.
By the way this same command works fine with hello program the one where each node says their identity.
_______________________________________________
discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20190617/3cb5bea8/attachment.html>
More information about the discuss
mailing list