<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">I disabled the whole firewall in those machines but, still get the same problem. connection refuse.</div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">I run the program in another set of totally different machines that we have, but still same problem.</div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">Any other thought where can be the problem?</div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">Thanks.</div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr">Amin Hassani,<br>CIS department at UAB,<br>
Birmingham, AL, USA.</div></div></div>
<br><div class="gmail_quote">On Wed, Nov 26, 2014 at 9:25 AM, Kenneth Raffenetti <span dir="ltr"><<a href="mailto:raffenet@mcs.anl.gov" target="_blank">raffenet@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The connection refused makes me think a firewall is getting in the way. Is TCP communication limited to specific ports on the cluster? If so, you can use this envvar to enforce a range of ports in MPICH.<br>
<br>
MPIR_CVAR_CH3_PORT_RANGE<br>
Description: The MPIR_CVAR_CH3_PORT_RANGE environment variable allows you to specify the range of TCP ports to be used by the process manager and the MPICH library. The format of this variable is <low>:<high>. To specify any available port, use 0:0.<br>
Default: {0,0}<div><div class="h5"><br>
<br>
On 11/25/2014 11:50 PM, Amin Hassani wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
Tried with the new configure too. same problem :(<br>
<br>
$ mpirun -hostfile hosts-hydra -np 2 test_dup<br>
Fatal error in MPI_Send: Unknown error class, error stack:<br>
MPI_Send(174)..............: MPI_Send(buf=0x7fffd90c76c8, count=1,<br>
MPI_INT, dest=1, tag=0, MPI_COMM_WORLD) failed<br>
MPID_nem_tcp_connpoll(1832): Communication error with rank 1: Connection<br>
refused<br>
<br>
==============================<u></u>==============================<u></u>=======================<br>
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= PID 5459 RUNNING AT oakmnt-0-a<br>
= EXIT CODE: 1<br>
= CLEANING UP REMAINING PROCESSES<br>
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
==============================<u></u>==============================<u></u>=======================<br>
[proxy:0:1@oakmnt-0-b] HYD_pmcd_pmip_control_cmd_cb<br>
(../../../../src/pm/hydra/pm/<u></u>pmiserv/pmip_cb.c:885): assert (!closed) failed<br>
[proxy:0:1@oakmnt-0-b] HYDT_dmxu_poll_wait_for_event<br>
(../../../../src/pm/hydra/<u></u>tools/demux/demux_poll.c:76): callback<br>
returned error status<br>
[proxy:0:1@oakmnt-0-b] main<br>
(../../../../src/pm/hydra/pm/<u></u>pmiserv/pmip.c:206): demux engine error<br>
waiting for event<br>
[mpiexec@oakmnt-0-a] HYDT_bscu_wait_for_completion<br>
(../../../../src/pm/hydra/<u></u>tools/bootstrap/utils/bscu_<u></u>wait.c:76): one of<br>
the processes terminated badly; aborting<br>
[mpiexec@oakmnt-0-a] HYDT_bsci_wait_for_completion<br>
(../../../../src/pm/hydra/<u></u>tools/bootstrap/src/bsci_wait.<u></u>c:23): launcher<br>
returned error waiting for completion<br>
[mpiexec@oakmnt-0-a] HYD_pmci_wait_for_completion<br>
(../../../../src/pm/hydra/pm/<u></u>pmiserv/pmiserv_pmci.c:218): launcher<br>
returned error waiting for completion<br>
[mpiexec@oakmnt-0-a] main<br>
(../../../../src/pm/hydra/ui/<u></u>mpich/mpiexec.c:344): process manager error<br>
waiting for completion<br>
<br>
<br>
Amin Hassani,<br>
CIS department at UAB,<br>
Birmingham, AL, USA.<br>
<br>
On Tue, Nov 25, 2014 at 11:44 PM, Lu, Huiwei <<a href="mailto:huiweilu@mcs.anl.gov" target="_blank">huiweilu@mcs.anl.gov</a><br></div></div><span class="">
<mailto:<a href="mailto:huiweilu@mcs.anl.gov" target="_blank">huiweilu@mcs.anl.gov</a>>> wrote:<br>
<br>
So the error only happens when there is communication.<br>
<br>
It may be caused by IB as your guessed before. Could you try to<br>
reconfigure MPICH using "./configure --with-device=ch3:nemesis:tcp”<br>
and try again?<br>
<br>
—<br>
Huiwei<br>
<br>
> On Nov 25, 2014, at 11:23 PM, Amin Hassani <<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a><br></span><span class="">
<mailto:<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a>>> wrote:<br>
><br>
> Yes it works.<br>
> output:<br>
><br>
> $ mpirun -hostfile hosts-hydra -np 2 test<br>
> rank 1<br>
> rank 0<br>
><br>
><br>
> Amin Hassani,<br>
> CIS department at UAB,<br>
> Birmingham, AL, USA.<br>
><br>
> On Tue, Nov 25, 2014 at 11:20 PM, Lu, Huiwei<br></span><span class="">
<<a href="mailto:huiweilu@mcs.anl.gov" target="_blank">huiweilu@mcs.anl.gov</a> <mailto:<a href="mailto:huiweilu@mcs.anl.gov" target="_blank">huiweilu@mcs.anl.gov</a>>> wrote:<br>
> Could you try to run the following simple code to see if it works?<br>
><br>
> #include <mpi.h><br>
> #include <stdio.h><br>
> int main(int argc, char** argv)<br>
> {<br>
> int rank, size;<br>
> MPI_Init(&argc, &argv);<br>
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br>
> printf("rank %d\n", rank);<br>
> MPI_Finalize();<br>
> return 0;<br>
> }<br>
><br>
> —<br>
> Huiwei<br>
><br>
> > On Nov 25, 2014, at 11:11 PM, Amin Hassani<br></span><span class="">
<<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a> <mailto:<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a>>> wrote:<br>
> ><br>
> > No, I checked. Also I always install my MPI's in<br>
/nethome/students/ahassani/<u></u>usr/mpi. I never install them in<br>
/nethome/students/ahassani/<u></u>usr. So MPI files will never get there.<br>
Even if put the /usr/mpi/bin in front of /usr/bin, it won't affect<br>
anything. There has never been any mpi installed in /usr/bin.<br>
> ><br>
> > Thank you.<br>
> > ______________________________<u></u>_________________<br></span>
> > discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><span class=""><br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/<u></u>mailman/listinfo/discuss</a><br>
><br>
> ______________________________<u></u>_________________<br></span>
> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><span class=""><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/<u></u>mailman/listinfo/discuss</a><br>
><br>
> ______________________________<u></u>_________________<br></span>
> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><span class=""><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/<u></u>mailman/listinfo/discuss</a><br>
<br>
______________________________<u></u>_________________<br></span>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><span class=""><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/<u></u>mailman/listinfo/discuss</a><br>
<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/<u></u>mailman/listinfo/discuss</a><br>
<br>
</span></blockquote><div class="HOEnZb"><div class="h5">
______________________________<u></u>_________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/<u></u>mailman/listinfo/discuss</a></div></div></blockquote></div><br></div>