[mpich-discuss] MPI_init very slow with more than 3 nodes
Pavan Balaji
balaji at mcs.anl.gov
Fri Nov 29 08:13:42 CST 2013
Try to ssh between the nodes and see how long it takes. It might give some hint on what’s going on.
— Pavan
On Nov 29, 2013, at 5:21 AM, Bixente BODO GOMEZ <bixente.bodo at ehu.es> wrote:
> Hi.
>
> In the attachment I send the tests I've done. Yesterday I had to wait 20 minutes; now only 4 or 6.
> I will ask about the network.
>
>
> Pavan Balaji <balaji at mcs.anl.gov> escribió:
>
>> It is possible there’s something really slow on your network. Just to eliminate MPI_INIT as a possible cause, can you try a non-MPI program: maybe /bin/true or /bin/hostname?
>>
>> % mpiexec -f fila5 -np 8 /bin/true
>>
>> — Pavan
>>
>> On Nov 28, 2013, at 8:10 AM, Bixente BODO GOMEZ <bixente.bodo at ehu.es> wrote:
>>
>>> Goods.
>>>
>>> I'm testing a mpich cluster (3.0.4) with 7 nodes quad core, Ubuntu 12.04. The master has the home directory
>>> and the nodes get it by nfs. I have change RPCNFSDCOUN from 8 to 64.
>>>
>>> The programs go fine with master and ONE of the other nodes, but when I start them with more nodes,
>>> MPI_Init (I think so) takes long time (~20 minutes). At these time in all nodes there is many network
>>> (read and write) and master's hard disk activity. For exemple:
>>>
>>> mpiu at u105251:~$ date; mpirun -f fila5 -np 8 test/hello; date
>>> mié nov 27 15:17:29 CET 2013
>>> Hola desde el procesador u105251. 0 de 8
>>> Hola desde el procesador u105251. 1 de 8
>>> Hola desde el procesador u105251. 2 de 8
>>> Hola desde el procesador u105251. 3 de 8
>>> Hola desde el procesador u103972. 4 de 8
>>> Hola desde el procesador u103972. 5 de 8
>>> Hola desde el procesador u103972. 6 de 8
>>> Hola desde el procesador u103972. 7 de 8
>>> mié nov 27 15:17:30 CET 2013
>>> mpiu at u105251:~$ date; mpirun -f fila5 -np 16 test/hello; date
>>> mié nov 27 15:17:39 CET 2013
>>> Hola desde el procesador u105251. 2 de 16
>>> Hola desde el procesador u105251. 0 de 16
>>> Hola desde el procesador u105251. 1 de 16
>>> Hola desde el procesador u105251. 3 de 16
>>> Hola desde el procesador u103972. 4 de 16
>>> Hola desde el procesador u103950. 8 de 16
>>> Hola desde el procesador u103976.12 de 16
>>> Hola desde el procesador u103972. 5 de 16
>>> Hola desde el procesador u103950. 9 de 16
>>> Hola desde el procesador u103976.13 de 16
>>> Hola desde el procesador u103972. 7 de 16
>>> Hola desde el procesador u103950.10 de 16
>>> Hola desde el procesador u103976.14 de 16
>>> Hola desde el procesador u103972. 6 de 16
>>> Hola desde el procesador u103950.11 de 16
>>> Hola desde el procesador u103976.15 de 16
>>> mié nov 27 15:36:18 CET 2013
>>> mpiu at u105251:~$
>>>
>>> When the programs start, i.e. since the first C instrucction, they run fine. For that I think that the problem
>>> is MPI_init
>>>
>>> Anybody kowns why?
>>> Thank.
>>>
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> <test.txt>_______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the discuss
mailing list