[mpich-discuss] having problem running MPICH on multiple nodes

Lu, Huiwei huiweilu at mcs.anl.gov
Tue Nov 25 22:20:32 CST 2014


Hi, Amin,

Could you quickly give us the output of the following command: "which mpirun"

Also, your simplest code couldn’t compile correctly: "error: ‘t_avg’ undeclared (first use in this function)”. Can you fix it?

—
Huiwei

> On Nov 25, 2014, at 2:58 PM, Amin Hassani <ahassani at cis.uab.edu> wrote:
> 
> This is the simplest code I have that doesn't run.
> 
> 
> #include <mpi.h>
> #include <stdio.h>
> #include <malloc.h>
> #include <unistd.h>
> #include <stdlib.h>
> 
> int main(int argc, char** argv)
> {
>     int rank, size;
>     int i, j, k;
>     double t1, t2;
>     int rc;
> 
>     MPI_Init(&argc, &argv);
>     MPI_Comm world = MPI_COMM_WORLD, newworld, newworld2;
>     MPI_Comm_rank(world, &rank);
>     MPI_Comm_size(world, &size);
> 
>     t2 = 1;
>     MPI_Allreduce(&t2, &t_avg, 1, MPI_DOUBLE, MPI_SUM, world);
>     t_avg = t_avg / size;
> 
>     MPI_Finalize();
> 
>     return 0;
> }​
> 
> Amin Hassani,
> CIS department at UAB,
> Birmingham, AL, USA.
> 
> On Tue, Nov 25, 2014 at 2:46 PM, "Antonio J. Peña" <apenya at mcs.anl.gov> wrote:
> 
> Hi Amin,
> 
> Can you share with us a minimal piece of code with which you can reproduce this issue?
> 
> Thanks,
>   Antonio
> 
> 
> 
> On 11/25/2014 12:52 PM, Amin Hassani wrote:
>> Hi,
>> 
>> I am having problem running MPICH, on multiple nodes. When I run an multiple MPI processes on one node, it totally works, but when I try to run on multiple nodes, it fails with the error below.
>> My machines have Debian OS, Both infiniband and TCP interconnects. I'm guessing it has something do to with the TCP network, but I can run openmpi on these machines with no problem. But for some reason I cannot run MPICH on multiple nodes. Please let me know if more info is needed from my side. I'm guessing there are some configuration that I am missing. I used MPICH 3.1.3 for this test. I googled this problem but couldn't find any solution.
>> 
>> ​In my MPI program, I am doing a simple allreduce over MPI_COMM_WORLD​.
>> 
>> ​my host file (hosts-hydra) is something like this:
>> oakmnt-0-a:1
>> oakmnt-0-b:1 ​
>> 
>> ​I get this error:​
>> 
>> $ mpirun -hostfile hosts-hydra -np 2  test_dup
>> Assertion failed in file ../src/mpi/coll/helper_fns.c at line 490: status->MPI_TAG == recvtag
>> Assertion failed in file ../src/mpi/coll/helper_fns.c at line 490: status->MPI_TAG == recvtag
>> internal ABORT - process 1
>> internal ABORT - process 0
>> 
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 30744 RUNNING AT oakmnt-0-b
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> [mpiexec at vulcan13] HYDU_sock_read (../../../../src/pm/hydra/utils/sock/sock.c:239): read error (Bad file descriptor)
>> [mpiexec at vulcan13] control_cb (../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:199): unable to read command from proxy
>> [mpiexec at vulcan13] HYDT_dmxu_poll_wait_for_event (../../../../src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status
>> [mpiexec at vulcan13] HYD_pmci_wait_for_completion (../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
>> [mpiexec at vulcan13] main (../../../../src/pm/hydra/ui/mpich/mpiexec.c:344): process manager error waiting for completion
>> 
>> Thanks.
>> Amin Hassani,
>> CIS department at UAB,
>> Birmingham, AL, USA.
>> 
>> 
>> _______________________________________________
>> discuss mailing list     
>> discuss at mpich.org
>> 
>> To manage subscription options or unsubscribe:
>> 
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> -- 
> Antonio J. Peña
> Postdoctoral Appointee
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 9700 South Cass Avenue, Bldg. 240, Of. 3148
> Argonne, IL 60439-4847
> 
> apenya at mcs.anl.gov
> www.mcs.anl.gov/~apenya
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list