[mpich-discuss] having problem running MPICH on multiple nodes
Amin Hassani
ahassani at cis.uab.edu
Tue Nov 25 14:58:02 CST 2014
This is the simplest code I have that doesn't run.
#include <mpi.h>
#include <stdio.h>
#include <malloc.h>
#include <unistd.h>
#include <stdlib.h>
int main(int argc, char** argv)
{
int rank, size;
int i, j, k;
double t1, t2;
int rc;
MPI_Init(&argc, &argv);
MPI_Comm world = MPI_COMM_WORLD, newworld, newworld2;
MPI_Comm_rank(world, &rank);
MPI_Comm_size(world, &size);
t2 = 1;
MPI_Allreduce(&t2, &t_avg, 1, MPI_DOUBLE, MPI_SUM, world);
t_avg = t_avg / size;
MPI_Finalize();
return 0;
}
Amin Hassani,
CIS department at UAB,
Birmingham, AL, USA.
On Tue, Nov 25, 2014 at 2:46 PM, "Antonio J. Peña" <apenya at mcs.anl.gov>
wrote:
>
> Hi Amin,
>
> Can you share with us a minimal piece of code with which you can reproduce
> this issue?
>
> Thanks,
> Antonio
>
>
>
> On 11/25/2014 12:52 PM, Amin Hassani wrote:
>
> Hi,
>
> I am having problem running MPICH, on multiple nodes. When I run an
> multiple MPI processes on one node, it totally works, but when I try to run
> on multiple nodes, it fails with the error below.
> My machines have Debian OS, Both infiniband and TCP interconnects. I'm
> guessing it has something do to with the TCP network, but I can run openmpi
> on these machines with no problem. But for some reason I cannot run MPICH
> on multiple nodes. Please let me know if more info is needed from my side.
> I'm guessing there are some configuration that I am missing. I used MPICH
> 3.1.3 for this test. I googled this problem but couldn't find any solution.
>
> In my MPI program, I am doing a simple allreduce over MPI_COMM_WORLD.
>
> my host file (hosts-hydra) is something like this:
> oakmnt-0-a:1
> oakmnt-0-b:1
>
> I get this error:
>
> $ mpirun -hostfile hosts-hydra -np 2 test_dup
> Assertion failed in file ../src/mpi/coll/helper_fns.c at line 490:
> status->MPI_TAG == recvtag
> Assertion failed in file ../src/mpi/coll/helper_fns.c at line 490:
> status->MPI_TAG == recvtag
> internal ABORT - process 1
> internal ABORT - process 0
>
>
> ===================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 30744 RUNNING AT oakmnt-0-b
> = EXIT CODE: 1
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> [mpiexec at vulcan13] HYDU_sock_read
> (../../../../src/pm/hydra/utils/sock/sock.c:239): read error (Bad file
> descriptor)
> [mpiexec at vulcan13] control_cb
> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:199): unable to read
> command from proxy
> [mpiexec at vulcan13] HYDT_dmxu_poll_wait_for_event
> (../../../../src/pm/hydra/tools/demux/demux_poll.c:76): callback returned
> error status
> [mpiexec at vulcan13] HYD_pmci_wait_for_completion
> (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:198): error waiting for
> event
> [mpiexec at vulcan13] main
> (../../../../src/pm/hydra/ui/mpich/mpiexec.c:344): process manager error
> waiting for completion
>
> Thanks.
> Amin Hassani,
> CIS department at UAB,
> Birmingham, AL, USA.
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Antonio J. Peña
> Postdoctoral Appointee
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 9700 South Cass Avenue, Bldg. 240, Of. 3148
> Argonne, IL 60439-4847apenya at mcs.anl.govwww.mcs.anl.gov/~apenya
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141125/7990a49b/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list