[mpich-discuss] having problem running MPICH on multiple nodes
Lu, Huiwei
huiweilu at mcs.anl.gov
Tue Nov 25 22:31:47 CST 2014
I can run your simplest code on my machine without a problem. So I guess there is some problem in cluster connection. Could you give me the output of the following?
$ mpirun -hostfile hosts-hydra -np 2 hostname
—
Huiwei
> On Nov 25, 2014, at 10:24 PM, Amin Hassani <ahassani at cis.uab.edu> wrote:
>
> Hi,
>
> the code that I gave you had more stuff in it that I didn't want to distract you. here is the simpler send/recv test that I just ran and it failed.
>
> which mpirun: specific directory that I install my MPIs
> /nethome/students/ahassani/usr/mpi/bin/mpirun
>
> mpirun with no argument:
> $ mpirun
> [mpiexec at oakmnt-0-a] set_default_values (../../../../src/pm/hydra/ui/mpich/utils.c:1528): no executable provided
> [mpiexec at oakmnt-0-a] HYD_uii_mpx_get_parameters (../../../../src/pm/hydra/ui/mpich/utils.c:1739): setting default values failed
> [mpiexec at oakmnt-0-a] main (../../../../src/pm/hydra/ui/mpich/mpiexec.c:153): error parsing parameters
>
>
>
> #include <mpi.h>
> #include <stdio.h>
> #include <malloc.h>
> #include <unistd.h>
> #include <stdlib.h>
>
> int skip = 10;
> int iter = 30;
>
> int main(int argc, char** argv)
> {
> int rank, size;
> int i, j, k;
> double t1, t2;
> int rc;
>
> MPI_Init(&argc, &argv);
> MPI_Comm world = MPI_COMM_WORLD, newworld, newworld2;
> MPI_Comm_rank(world, &rank);
> MPI_Comm_size(world, &size);
> int a = 0, b = 1;
> if(rank == 0){
> MPI_Send(&a, 1, MPI_INT, 1, 0, world);
> }else{
> MPI_Recv(&b, 1, MPI_INT, 0, 0, world, MPI_STATUS_IGNORE);
> }
>
> printf("b is %d\n", b);
> MPI_Finalize();
>
> return 0;
> }
>
> Thank you.
>
>
> Amin Hassani,
> CIS department at UAB,
> Birmingham, AL, USA.
>
> On Tue, Nov 25, 2014 at 10:20 PM, Lu, Huiwei <huiweilu at mcs.anl.gov> wrote:
> Hi, Amin,
>
> Could you quickly give us the output of the following command: "which mpirun"
>
> Also, your simplest code couldn’t compile correctly: "error: ‘t_avg’ undeclared (first use in this function)”. Can you fix it?
>
> —
> Huiwei
>
> > On Nov 25, 2014, at 2:58 PM, Amin Hassani <ahassani at cis.uab.edu> wrote:
> >
> > This is the simplest code I have that doesn't run.
> >
> >
> > #include <mpi.h>
> > #include <stdio.h>
> > #include <malloc.h>
> > #include <unistd.h>
> > #include <stdlib.h>
> >
> > int main(int argc, char** argv)
> > {
> > int rank, size;
> > int i, j, k;
> > double t1, t2;
> > int rc;
> >
> > MPI_Init(&argc, &argv);
> > MPI_Comm world = MPI_COMM_WORLD, newworld, newworld2;
> > MPI_Comm_rank(world, &rank);
> > MPI_Comm_size(world, &size);
> >
> > t2 = 1;
> > MPI_Allreduce(&t2, &t_avg, 1, MPI_DOUBLE, MPI_SUM, world);
> > t_avg = t_avg / size;
> >
> > MPI_Finalize();
> >
> > return 0;
> > }
> >
> > Amin Hassani,
> > CIS department at UAB,
> > Birmingham, AL, USA.
> >
> > On Tue, Nov 25, 2014 at 2:46 PM, "Antonio J. Peña" <apenya at mcs.anl.gov> wrote:
> >
> > Hi Amin,
> >
> > Can you share with us a minimal piece of code with which you can reproduce this issue?
> >
> > Thanks,
> > Antonio
> >
> >
> >
> > On 11/25/2014 12:52 PM, Amin Hassani wrote:
> >> Hi,
> >>
> >> I am having problem running MPICH, on multiple nodes. When I run an multiple MPI processes on one node, it totally works, but when I try to run on multiple nodes, it fails with the error below.
> >> My machines have Debian OS, Both infiniband and TCP interconnects. I'm guessing it has something do to with the TCP network, but I can run openmpi on these machines with no problem. But for some reason I cannot run MPICH on multiple nodes. Please let me know if more info is needed from my side. I'm guessing there are some configuration that I am missing. I used MPICH 3.1.3 for this test. I googled this problem but couldn't find any solution.
> >>
> >> In my MPI program, I am doing a simple allreduce over MPI_COMM_WORLD.
> >>
> >> my host file (hosts-hydra) is something like this:
> >> oakmnt-0-a:1
> >> oakmnt-0-b:1
> >>
> >> I get this error:
> >>
> >> $ mpirun -hostfile hosts-hydra -np 2 test_dup
> >> Assertion failed in file ../src/mpi/coll/helper_fns.c at line 490: status->MPI_TAG == recvtag
> >> Assertion failed in file ../src/mpi/coll/helper_fns.c at line 490: status->MPI_TAG == recvtag
> >> internal ABORT - process 1
> >> internal ABORT - process 0
> >>
> >> ===================================================================================
> >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> = PID 30744 RUNNING AT oakmnt-0-b
> >> = EXIT CODE: 1
> >> = CLEANING UP REMAINING PROCESSES
> >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> ===================================================================================
> >> [mpiexec at vulcan13] HYDU_sock_read (../../../../src/pm/hydra/utils/sock/sock.c:239): read error (Bad file descriptor)
> >> [mpiexec at vulcan13] control_cb (../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:199): unable to read command from proxy
> >> [mpiexec at vulcan13] HYDT_dmxu_poll_wait_for_event (../../../../src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status
> >> [mpiexec at vulcan13] HYD_pmci_wait_for_completion (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
> >> [mpiexec at vulcan13] main (../../../../src/pm/hydra/ui/mpich/mpiexec.c:344): process manager error waiting for completion
> >>
> >> Thanks.
> >> Amin Hassani,
> >> CIS department at UAB,
> >> Birmingham, AL, USA.
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> discuss at mpich.org
> >>
> >> To manage subscription options or unsubscribe:
> >>
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >
> >
> > --
> > Antonio J. Peña
> > Postdoctoral Appointee
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 9700 South Cass Avenue, Bldg. 240, Of. 3148
> > Argonne, IL 60439-4847
> >
> > apenya at mcs.anl.gov
> > www.mcs.anl.gov/~apenya
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list