[mpich-discuss] MPI_init very slow with more than 3 nodes

Bixente Bodo Gómez bixente.bodo at ehu.es
Fri Nov 29 08:42:46 CST 2013


Before writing to the list I tried it.  The response was fast, without
asking for authentication.
Next week I will try with an isolated switch from the rest of the
network.  I will send the results.

Thank you very much.

El vie, 29-11-2013 a las 08:13 -0600, Pavan Balaji escribió:
> Try to ssh between the nodes and see how long it takes.  It might give some hint on what’s going on.
> 
>   — Pavan
> 
> On Nov 29, 2013, at 5:21 AM, Bixente BODO GOMEZ <bixente.bodo at ehu.es> wrote:
> 
> > Hi.
> > 
> > In the attachment I send the tests I've done. Yesterday I had to wait 20 minutes; now only 4 or 6.
> > I will ask about the network.
> > 
> > 
> > Pavan Balaji <balaji at mcs.anl.gov> escribió:
> > 
> >> It is possible there’s something really slow on your network.  Just to eliminate MPI_INIT as a possible cause, can you try a non-MPI program:  maybe /bin/true or /bin/hostname?
> >> 
> >> % mpiexec -f fila5 -np 8 /bin/true
> >> 
> >>  — Pavan
> >> 
> >> On Nov 28, 2013, at 8:10 AM, Bixente BODO GOMEZ <bixente.bodo at ehu.es> wrote:
> >> 
> >>> Goods.
> >>> 
> >>> I'm testing a mpich cluster (3.0.4) with 7 nodes quad core, Ubuntu 12.04.  The master has the home directory
> >>> and the nodes get it by nfs.  I have change RPCNFSDCOUN from 8 to 64.
> >>> 
> >>> The programs go fine with master and ONE of the other nodes, but when I start them with more nodes,
> >>> MPI_Init (I think so) takes long time (~20 minutes).  At these time in all nodes there is many network
> >>> (read and write) and master's hard disk activity.  For exemple:
> >>> 
> >>> mpiu at u105251:~$ date; mpirun -f fila5 -np 8 test/hello; date
> >>> mié nov 27 15:17:29 CET 2013
> >>> Hola desde el procesador u105251. 0 de 8
> >>> Hola desde el procesador u105251. 1 de 8
> >>> Hola desde el procesador u105251. 2 de 8
> >>> Hola desde el procesador u105251. 3 de 8
> >>> Hola desde el procesador u103972. 4 de 8
> >>> Hola desde el procesador u103972. 5 de 8
> >>> Hola desde el procesador u103972. 6 de 8
> >>> Hola desde el procesador u103972. 7 de 8
> >>> mié nov 27 15:17:30 CET 2013
> >>> mpiu at u105251:~$ date; mpirun -f fila5 -np 16 test/hello; date
> >>> mié nov 27 15:17:39 CET 2013
> >>> Hola desde el procesador u105251. 2 de 16
> >>> Hola desde el procesador u105251. 0 de 16
> >>> Hola desde el procesador u105251. 1 de 16
> >>> Hola desde el procesador u105251. 3 de 16
> >>> Hola desde el procesador u103972. 4 de 16
> >>> Hola desde el procesador u103950. 8 de 16
> >>> Hola desde el procesador u103976.12 de 16
> >>> Hola desde el procesador u103972. 5 de 16
> >>> Hola desde el procesador u103950. 9 de 16
> >>> Hola desde el procesador u103976.13 de 16
> >>> Hola desde el procesador u103972. 7 de 16
> >>> Hola desde el procesador u103950.10 de 16
> >>> Hola desde el procesador u103976.14 de 16
> >>> Hola desde el procesador u103972. 6 de 16
> >>> Hola desde el procesador u103950.11 de 16
> >>> Hola desde el procesador u103976.15 de 16
> >>> mié nov 27 15:36:18 CET 2013
> >>> mpiu at u105251:~$
> >>> 
> >>> When the programs start, i.e. since the first C instrucction, they run fine.  For that I think that the problem
> >>> is MPI_init
> >>> 
> >>> Anybody kowns why?
> >>> Thank.
> >>> 
> >>> _______________________________________________
> >>> discuss mailing list     discuss at mpich.org
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mpich.org/mailman/listinfo/discuss
> >> 
> >> --
> >> Pavan Balaji
> >> http://www.mcs.anl.gov/~balaji
> >> 
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> > 
> > 
> > <test.txt>_______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> 
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss





More information about the discuss mailing list