<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">Is the failure specific to MPI_Allreduce? Did other tests (like simple send/recv) work?</div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div>
<br><div class="gmail_quote">On Tue, Nov 25, 2014 at 9:41 PM, Amin Hassani <span dir="ltr"><<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">Is there any debugging flag that I can turn on to figure out problems? </div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">Thanks.</div></div><div class="gmail_extra"><span class=""><br clear="all"><div><div><div dir="ltr">Amin Hassani,<br>CIS department at UAB,<br>
Birmingham, AL, USA.</div></div></div>
<br></span><div><div class="h5"><div class="gmail_quote">On Tue, Nov 25, 2014 at 9:31 PM, Amin Hassani <span dir="ltr"><<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">Now I'm getting this error with MPICH-3.2a2</div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">Any thought?</div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small"><br></div><div class="gmail_extra"><span><div class="gmail_default"><span style="font-family:tahoma,sans-serif;font-size:small"></span><font face="tahoma, sans-serif">$ mpirun -hostfile hosts-hydra -np 2 test_dup</font></div></span><div class="gmail_default"><font face="tahoma, sans-serif">Fatal error in MPI_Allreduce: Unknown error class, error stack:</font></div><div class="gmail_default"><font face="tahoma, sans-serif">MPI_Allreduce(912)....................: MPI_Allreduce(sbuf=0x7fffa5240e60, rbuf=0x7fffa5240e68, count=1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD) failed</font></div><div class="gmail_default"><font face="tahoma, sans-serif">MPIR_Allreduce_impl(769)..............: </font></div><div class="gmail_default"><font face="tahoma, sans-serif">MPIR_Allreduce_intra(419).............: </font></div><div class="gmail_default"><font face="tahoma, sans-serif">MPIDU_Complete_posted_with_error(1192): Process failed</font></div><div class="gmail_default"><font face="tahoma, sans-serif">Fatal error in MPI_Allreduce: Unknown error class, error stack:</font></div><div class="gmail_default"><font face="tahoma, sans-serif">MPI_Allreduce(912)....................: MPI_Allreduce(sbuf=0x7fffaf6ef070, rbuf=0x7fffaf6ef078, count=1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD) failed</font></div><div class="gmail_default"><font face="tahoma, sans-serif">MPIR_Allreduce_impl(769)..............: </font></div><div class="gmail_default"><font face="tahoma, sans-serif">MPIR_Allreduce_intra(419).............: </font></div><div class="gmail_default"><font face="tahoma, sans-serif">MPIDU_Complete_posted_with_error(1192): Process failed</font></div><span><div class="gmail_default"><font face="tahoma, sans-serif"><br></font></div><div class="gmail_default"><font face="tahoma, sans-serif">===================================================================================</font></div><div class="gmail_default"><font face="tahoma, sans-serif">= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES</font></div></span><div class="gmail_default"><font face="tahoma, sans-serif">= PID 451 RUNNING AT oakmnt-0-a</font></div><span><div class="gmail_default"><font face="tahoma, sans-serif">= EXIT CODE: 1</font></div><div class="gmail_default"><font face="tahoma, sans-serif">= CLEANING UP REMAINING PROCESSES</font></div><div class="gmail_default"><font face="tahoma, sans-serif">= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES</font></div><div class="gmail_default"><font face="tahoma, sans-serif">===================================================================================</font><span style="font-family:tahoma,sans-serif;font-size:small"></span></div><div class="gmail_default"><span style="font-family:tahoma,sans-serif;font-size:small"><br></span></div></span><span><div class="gmail_default">Thanks.</div><br clear="all"><div><div><div dir="ltr">Amin Hassani,<br>CIS department at UAB,<br>
Birmingham, AL, USA.</div></div></div>
<br></span><div><div><div class="gmail_quote">On Tue, Nov 25, 2014 at 9:25 PM, Amin Hassani <span dir="ltr"><<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">Ok, I'll try to test the alpha version. I'll let you know the results.</div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">Thank you.</div></div><div class="gmail_extra"><span><br clear="all"><div><div><div dir="ltr">Amin Hassani,<br>CIS department at UAB,<br>
Birmingham, AL, USA.</div></div></div>
<br></span><div><div><div class="gmail_quote">On Tue, Nov 25, 2014 at 9:21 PM, Bland, Wesley B. <span dir="ltr"><<a href="mailto:wbland@anl.gov" target="_blank">wbland@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">
It’s hard to tell then. Other than some problems compiling (not declaring all of your variables), everything seems ok. Can you try running with the most recent alpha. I have no idea what bug we could have fixed here to make things work, but it’d be good to
eliminate the possibility.
<div><br>
</div>
<div>Thanks,</div>
<div>Wesley<br>
<div><br>
<div>
<blockquote type="cite"><div><div>
<div>On Nov 25, 2014, at 10:11 PM, Amin Hassani <<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a>> wrote:</div>
<br>
</div></div><div><div><div>
<div dir="ltr">
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
Here I attached config.log exits in the root folder where it is compiled. I'm not too familiar with MPICH but, there are other config.logs in other directories also but not sure if you needed them too. </div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
I don't have any specific environment variable that can relate to MPICH. Also tried with</div>
<div class="gmail_default"><font face="tahoma, sans-serif">export HYDRA_HOST_FILE=<address to host file>,</font><br>
</div>
<div class="gmail_default"><font face="tahoma, sans-serif">but have the same problem.</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">I don't do anything FT related in MPICH, I don't think this version of MPICH has anything related to FT in it.</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"><br>
</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">Thanks.</font></div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div>
<div dir="ltr">Amin Hassani,<br>
CIS department at UAB,<br>
Birmingham, AL, USA.</div>
</div>
</div>
<br>
<div class="gmail_quote">On Tue, Nov 25, 2014 at 9:02 PM, Bland, Wesley B. <span dir="ltr">
<<a href="mailto:wbland@anl.gov" target="_blank">wbland@anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">Can you also provide your config.log and any CVARs or other relevant environment variables that you might be setting (for instance, in relation to fault tolerance)?
<div><br>
</div>
<div>Thanks,</div>
<div>Wesley
<div>
<div><br>
<div><br>
<div>
<blockquote type="cite">
<div>On Nov 25, 2014, at 3:58 PM, Amin Hassani <<a href="mailto:ahassani@cis.uab.edu" target="_blank">ahassani@cis.uab.edu</a>> wrote:</div>
<br>
<div>
<div dir="ltr">
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
This is the simplest code I have that doesn't run.</div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
<br>
</div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
<br>
</div>
<div class="gmail_default">
<div class="gmail_default"><font face="tahoma, sans-serif">#include <mpi.h></font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">#include <stdio.h></font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">#include <malloc.h></font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">#include <unistd.h></font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">#include <stdlib.h></font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"><br>
</font></div>
<div class="gmail_default"><span style="font-family:tahoma,sans-serif">int main(int argc, char** argv)</span><br>
</div>
<div class="gmail_default"><font face="tahoma, sans-serif">{</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> int rank, size;</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> int i, j, k;</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> double t1, t2;</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> int rc;</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"><br>
</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> MPI_Init(&argc, &argv);</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> MPI_Comm world = MPI_COMM_WORLD, newworld, newworld2;</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> MPI_Comm_rank(world, &rank);</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> MPI_Comm_size(world, &size);</font></div>
<div class="gmail_default"><span style="font-family:tahoma,sans-serif"><br>
</span></div>
<div class="gmail_default"><span style="font-family:tahoma,sans-serif"> t2 = 1;</span></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> MPI_Allreduce(&t2, &t_avg, 1, MPI_DOUBLE, MPI_SUM, world);</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> t_avg = t_avg / size;</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"><br>
</font></div>
<div class="gmail_default"><span style="font-family:tahoma,sans-serif"> MPI_Finalize();</span><br>
</div>
<div class="gmail_default"><font face="tahoma, sans-serif"><br>
</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"> return 0;</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">}</font></div>
</div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div>
<div dir="ltr">Amin Hassani,<br>
CIS department at UAB,<br>
Birmingham, AL, USA.</div>
</div>
</div>
<br>
<div class="gmail_quote">On Tue, Nov 25, 2014 at 2:46 PM, "Antonio J. Peña" <span dir="ltr">
<<a href="mailto:apenya@mcs.anl.gov" target="_blank">apenya@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
Hi Amin,<br>
<br>
Can you share with us a minimal piece of code with which you can reproduce this issue?<br>
<br>
Thanks,<br>
Antonio
<div>
<div><br>
<br>
<br>
On 11/25/2014 12:52 PM, Amin Hassani wrote:<br>
</div>
</div>
</div>
<blockquote type="cite">
<div>
<div>
<div dir="ltr">
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
Hi,</div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
<br>
</div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
I am having problem running MPICH, on multiple nodes. When I run an multiple MPI processes on one node, it totally works, but when I try to run on multiple nodes, it fails with the error below.</div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
My machines have Debian OS, Both infiniband and TCP interconnects. I'm guessing it has something do to with the TCP network, but I can run openmpi on these machines with no problem. But for some reason I cannot run MPICH on multiple nodes. Please let me know
if more info is needed from my side. I'm guessing there are some configuration that I am missing. I used MPICH 3.1.3 for this test. I googled this problem but couldn't find any solution.</div>
<div><br>
</div>
<div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
In my MPI program, I am doing a simple allreduce over MPI_COMM_WORLD.</div>
<br>
</div>
<div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
my host file (hosts-hydra) is something like this:</div>
<div class="gmail_default" style="font-family:tahoma,sans-serif">oakmnt-0-a:1</div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
oakmnt-0-b:1 </div>
</div>
<div><br>
</div>
<div>
<div class="gmail_default" style="font-family:tahoma,sans-serif;font-size:small">
I get this error:</div>
<br>
</div>
<div>
<div class="gmail_default">
<div class="gmail_default"><font face="tahoma, sans-serif">$ mpirun -hostfile hosts-hydra -np 2 test_dup</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">Assertion failed in file ../src/mpi/coll/helper_fns.c at line 490: status->MPI_TAG == recvtag</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">Assertion failed in file ../src/mpi/coll/helper_fns.c at line 490: status->MPI_TAG == recvtag</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">internal ABORT - process 1</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">internal ABORT - process 0</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"><br>
</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">===================================================================================</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">= PID 30744 RUNNING AT oakmnt-0-b</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">= EXIT CODE: 1</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">= CLEANING UP REMAINING PROCESSES</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">===================================================================================</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">[mpiexec@vulcan13] HYDU_sock_read (../../../../src/pm/hydra/utils/sock/sock.c:239): read error (Bad file descriptor)</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">[mpiexec@vulcan13] control_cb (../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:199): unable to read command from proxy</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">[mpiexec@vulcan13] HYDT_dmxu_poll_wait_for_event (../../../../src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">[mpiexec@vulcan13] HYD_pmci_wait_for_completion (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:198): error waiting for event</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">[mpiexec@vulcan13] main (../../../../src/pm/hydra/ui/mpich/mpiexec.c:344): process manager error waiting for completion</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif"><br>
</font></div>
<div class="gmail_default"><font face="tahoma, sans-serif">Thanks.</font></div>
</div>
</div>
<div>
<div>
<div dir="ltr">Amin Hassani,<br>
CIS department at UAB,<br>
Birmingham, AL, USA.</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset> <br>
</div>
</div>
<pre>_______________________________________________
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>
To manage subscription options or unsubscribe:
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a></pre>
<span><font color="#888888"></font></span></blockquote>
<span><font color="#888888"><br>
<br>
<pre cols="72">--
Antonio J. Peña
Postdoctoral Appointee
Mathematics and Computer Science Division
Argonne National Laboratory
9700 South Cass Avenue, Bldg. 240, Of. 3148
Argonne, IL 60439-4847
<a href="mailto:apenya@mcs.anl.gov" target="_blank">apenya@mcs.anl.gov</a>
<a href="http://www.mcs.anl.gov/~apenya" target="_blank">www.mcs.anl.gov/~apenya</a></pre>
</font></span></div>
<br>
_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</blockquote>
</div>
<br>
</div>
_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
<br>
_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</blockquote>
</div>
<br>
</div>
</div></div><span><config.log></span>_______________________________________________<span><br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
<br>_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br></blockquote></div><br></div></div></div>
</blockquote></div><br></div></div></div></div>
</blockquote></div><br></div></div></div>
<br>_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br></blockquote></div><br></div>